Titu Andreescu 
Gabriel Dospinescu 


Straight from the Book 


XYZ 


Press 


Titu Andreescu Gabriel Dospinescu 
University of Texas at Dallas Ecole Normale Supérieure, Lyon 


Library of Congress Control Number: 2012951362 
ISBN-13: 978-0-9799269-3-8 ISBN-10: 0-9799269-3-9 
© 2012 XYZ Press, LLC 


All rights reserved. This work may not be translated or copied in whole or in 
part without the written permission of the publisher (XYZ Press, LLC, 3425 
Neiman Rd., Plano, TX 75025, USA) and the authors except for brief excerpts 
in connection with reviews or scholarly analysis. Use in connection with any 
form of information storage and retrieval, electronic adaptation, computer 
software, or by similar or dissimilar methodology now known or hereafter 
developed is forbidden. The use in this publication of tradenames, trademarks, 
service marks and similar terms, even if they are not identified as such, is not 
to be taken as an expression of opinion as to whether or not they are subject 
to proprietary rights. 


987654321 
www.awesomemath.org 


Cover design by Iury Ulzutuev 


The only way to learn mathematics is to do mathematics. 


-Paul Halmos 


Foreword 


This book is a follow-on from the authors’ earlier book, ‘Problems from 
the Book’. However, it can certainly be read as a stand-alone book: it is not 
vital to have read the earlier book. 

The previous book was based around a collection of problems. In contrast, 
this book is based around a collection of solutions. These are solutions to 
some of the (often extremely challenging) problems from the earlier book. 
The topics chosen reflect those from the first twelve chapters of the previous 
book: so we have Cauchy-Schwarz, Algebraic Number Theory, Formal Series, 
Lagrange Interpolation, to name but a few. 

The book is one of the most remarkable mathematical texts I have ever 
seen. First of all, there is the richness of the problems, and the huge variety 
of solutions. The authors try to give several solutions to each problem, and 
moreover give insight about why each proof is the way it is, in what way the 
solutions differ from each other, and so on. The amount of work that has been 
put in, to compile and interrelate these solutions, is simply staggering. ‘There 
is enough here to keep any devotee of problems going for years and years. 

Secondly, the book is far more than a collection of solutions. The solu- 
tions are used as motivation for the introduction of some very clear expositions 
of mathematics. And this is modern, current, up-to-the-minute mathematics. 
For example, a discussion of Extremal Graph Theory leads to the celebrated 
Szemerédi-Trotter theorem on crossing numbers, and to the amazing applica- 
tions of this by Székely and then on to the very recent sum-product estimates 
of Elekes, Bourgain, Katz, Tao and others. This is absolutely state-of-the-art 
material. It is presented very clearly: in fact, it is probably the best exposition 
of this that I have seen in print. 


viil 


As another example, the Cauchy-Schwarz section leads on to the devel- 
opements of sieve theory, like the Large Sieve of Linnik and the Turán-Kubilius 
inequality. And again, everything is incredibly clearly presented. The same 
applies to the very large sections on Algebraic Number Theory, on p-adic 
Analysis, and many others. 

It is quite remarkable that the authors even know so much current math- 
ematics — I do not think any of my colleagues would be so well-informed over 
so wide an area. It is also remarkable that, at least in the areas in which I 
am competent to judge, their explanations of these topics are polished and 
exceptionally well thought-out: they give just the right words to help someone 
understand what is going on. 

Overall, this seems to me like an ‘instant classic’. There is so much 
material, of such a high quality, wherever one turns. Indeed, if one opens the 
book at random (as I have done several times), one is pulled in immediately 
by the lovely exposition. Everyone who loves mathematics and mathematical 
thinking should acquire this book. 


Imre Leader 
Professor of Pure Mathematics 


University of Cambridge 


Preface 


This book is a compilation of many suggestions, much advice, an even 
more hard work. Its main objective is to provide solutions to the problems 
which were originally proposed in the first 12 chapters of Problems from the 
Book. We were not able to provide full solutions in our first volume due to the 
lack of space. In addition, the statements of the proposed problems contained 
typos and some elementary mistakes which needed further editing. Finally, 
the problems were also considered to be quite difficult to tackle. With these 
points in mind, we came up with a two-part plan: to correct the identified 
errors and to publish comprehensive solutions to the problems. 

The first task, editing the statements of the proposed problems, was sim- 
ple and has already been completed in the second edition of Problems from the 
Book. Although we focused on changing several problems, we also introduced 
many new ones. The second task, providing full solutions, however, proved to 
be more challenging than expected, so we asked for help. We created a forum 
on www.mathlinks.ro (a familiar site for problem-solving enthusiasts) where 
solutions to the proposed problems were gathered. It was a great pleasure to 
witness the passion with which some of the best problem-solvers on mathlinks 
attacked these tough-nuts. This new book is the result of their common effort, 
and we thank them. 

Providing solutions to every problem within the limited space of one vol- 
ume turned out to be an optimistic plan. Only the solutions to problems 
from the first 12 chapters of the second edition of Problems from the Book are 
presented here. Furthermore, many of the problems are difficult and require 
a rather extensive mathematical background. We decided, therefore, to com- 
plement the problems and solutions with a series of addenda, using various 
problems as starting points for excursions into ”real mathematics”. Although 


we never underestimated the role of problem solving, we strongly believe that 
the reader will benefit more from embarking on a mathematical journey rather 
than navigating a huge list of scattered problems. This book tries to reconcile 
problem-solving with ” professional mathematics”. 


Let us now delve into the structure of the book which consists of 12 chap- 
ters and presents the problems and solutions proposed in the corresponding 
chapters of Problems from the Book, second edition. Many of these problems 
are fairly difficult and different approaches are presented for a majority of 
them. At the end of each chapter, we acknowledge those who provided so- 
lutions. Some chapters are followed by one or two addenda, which present 
topics of more advanced mathematics stemming from the elementary topics 
discussed in the problems of that chapter. 


The first two chapters focus on elementary algebraic inequalities (a no- 
table caveat is that some of the problems are quite challenging) and there is 
not much to comment regarding these chapters except for the fact that al- 
gebraic inequalities have proven fairly popular at mathematics competitions. 
To provide relief from this rather dry landscape (the reader will notice that 
most of the problems in these two chapters start with ” Let a,b,c be positive 
real numbers” ), we included an addendum presenting deep applications of the 
Cauchy-Schwarz inequality in analytic number theory. For instance, we dis- 
cuss Gallagher’s sieve, Linnik’s large sieve and its version due to Montgomery. 
We apply these results to the distribution of prime numbers (for instance 
Brun’s famous theorem stating that the sum of the inverses of the twin primes 
converges). The note-worthy Turan-Kubilius inequality and its classical appli- 
cations (the Hardy-Ramanujan theorem on the distribution of prime factors of 
n, Erdos’ multiplication table problem, Wirsing’s generalization of the prime 
number theorem) are also discussed. The reader will be exposed to the power 
of the Cauchy-Schwarz inequality in real mathematics and, hopefully, will un- 
derstand that the gymnastics of three-variables algebraic inequalities is not 
the Holy Grail. 


Chapter 3 discusses problems related to the unique factorization of inte- 
gers and the p-adic valuation maps. Among the topics discussed, we cover 
the local-global principle (which is extremely helpful in proving divisibilities 
or arithmetic identities), Legendre’s formula giving the p-adic valuation of n!, 


xi 


a beneficial elementary result called lifting the exponent lemma, as well as 
more advanced techniques from p-adic analysis. One of the most beautiful re- 
sults presented in this chapter in the celebrated Skolem-Mahler-Lech theorem 
concerning the zeros of a linearly recursive sequence. Readers will perhaps 
appreciate the applications of p-adic analysis, which is covered extensively in 
a long addendum. This addendum discusses, from a foundational level, the 
arithmetic of p-adic numbers, a subject that plays a central role in modern 
number theory. Once the basic groundwork has been laid, we discuss the 
p-adic analogues of classical functions (exponential, logarithm, Gamma func- 
tion) and their applications to dificult congruences (for instance, Kazandzidis’ 
famous supercongruence). This serves as a good opportunity to explore the 
arithmetic of Bernoulli numbers, Volkenborn’s theory of p-adic integration, 
Mahler and Amice’s classical theorems characterizing continuous and locally 
analytic functions on the ring of p-adic integers or Morita’s construction of the 
p-adic Gamma function. A second addendum to this chapter discusses various 
classical estimates on prime numbers, which are used throughout the book. 


Chapter 4 discusses problems and elementary topics related to prime num- 
bers of the form 4k + 1 and 4k + 3. The most intriguing problem discussed 
is, without any doubt, Cohn’s renowned theorem characterizing the perfect 
squares in the Lucas sequence. Chapter 5 is dedicated again to the yoga of 
algebraic inequalities and is followed by an addendum discussing applications 
of Hölder’s inequality. 


Chapter 6 focuses on extremal graph theory. Most of the problems revolve 
around Turán’s theorem, however the reader will also be exposed to topics 
such as chromatic numbers, bipartite graphs, etc. This chapter is followed 
by a relatively advanced addendum, discussing various topics related to the 
Szemerédi-Trotter theorem, which gives bounds for the number of incidences 
between a set of points and a set of curves. We discuss the theorem’s classical 
probabilistic proof, its generalization to multi-graphs due to Székely and its 
application to the sum-product problem due to Elekes, as well as more recent 
developments due to Bourgain, Katz, and Tao. These results are then applied 
to natural and nontrivial geometric questions (for instance: what is the least 
number of distinct distances determined by n points in the plane? what is the 
maximal number of triangles of the same area’). Finally, another addendum 


Xil 


completes this chapter and is dedicated to the powerful probabilistic method. 
After a short discussion on finite probability spaces, we provide many examples 
of combinatorial applications. 


Chapter 7 involves combinatorial and number theoretic applications of 
finite Fourier analysis. The central principles of this chapter include the roots 
of unity and the fact that congruences between integers can be expressed in 
terms of sums of powers of roots of unity. To provide the reader with a broader 
view, we beefed-up this chapter with an addendum discussing Fourier analysis 
on finite abelian groups and applying it to Gauss sums of Dirichlet characters, 
additive problems, combinatorial or analytic number theory. For instance, 
the reader will find a discussion of the Pólya-Vinogradov inequality and Vino- 
gradov’s beautiful use of this inequality to deduce a rather strong bound on 
the least quadratic non-residue mod p. At the same time, we explore Dirich- 
let’s L-functions, culminating in a proof of Dirichlet’s theorem on arithmetic 
progressions. This section is structured to first present the usual analytic proof 
(since it is really a masterpiece from all points of view), up to some important 
simplification due to Monsky. At the same time, we also discuss how to turn 
this into an elementary proof that avoids complex analysis and dramatically 
uses Abel’s summation formula. 


Chapter 8 focuses on diverse applications of generating functions. This 
is an absolutely crucial tool in combinatorics, be it additive or enumerative. 
In this section, the reader will have the opportunity to explore enumerative 
problems (Catalan’s problem, counting the number of solutions of linear dio- 
phantine equations or the number of irreducible polynomials mod p, of fixed 
degree). We also discuss exotic combinatorial identities or recursive sequences, 
which can be solved elegantly using generating functions, but also rather chal- 
lenging congruences that appear so often in number theory. The chapter is 
followed by an addendum presenting a very classical topic in enumerative 
combinatorics, Lagrange’s inversion formula. Among the applications, let us 
mention Abel’s identity, other derived combinatorial identities, Cayley’s the- 
orem on labeled trees, and various related problems. 

Chapter 9 is rather extensive, due to the vast nature of the topic covered, 
algebraic number theory. While we are able only to scratch the surface, nev- 
ertheless the reader will find a variety of intriguing techniques and ideas. For 


xili 


instance, we discuss arithmetic properties of cyclotomic polynomials (includ- 
ing Mann’s beautiful theorem on linear equations in roots of unity), rationality 
problems, and various applications of the theorem of symmetric polynomials. 
In addition, we present techniques rooted in the theory of ideals in number 
fields, finite fields and p-adic methods. We also give an overview of the el- 
ementary algebraic number theory in the addendum following this chapter. 
After a brief review on ideals, field extensions, and algebraic numbers, we 
proceed with a discussion of the primitive element theorem and embeddings 
of number fields into C. We also briefly survey Galois theory, and the fun- 
damental theorem on the prime factorization of ideals in number fields, due 
to Kummer and Dedekind. Once the foundation has been set, we discuss the 
prime factorization in quadratic and cyclotomic fields and apply these tech- 
niques to basic problems that explore the aforementioned theories. Finally, we 
discuss various applications of Bauer’s theorem and of Chebotarev’s theorem. 
The next addendum is concerned with the fascinating topic of counting the 
number of solutions modulo p of systems of polynomial equations. We use 
this as an opportunity to state and prove the basic structural results on finite 
fields and introduce the Gauss and Jacobi sums. We go ahead and count the 
number of points over a finite field of a diagonal hypersurface and to compute 
its zeta function. This is a beautiful theorem of Weil, the very tip of a massive 
iceberg. 

Chapter 10 focuses on the arithmetic of polynomials with integer coefh- 
cients. An essential aspect of the discussion concentrates on Mahler expan- 
sions, the theory of finite differences, and their applications. The techniques 
used in this chapter are rather diverse. Although the problems can be consid- 
ered basic, they are challenging and require advanced problem-solving skills. 


Chapter 11 provides respite from the difficult tasks mentioned above. It 
discusses Lagrange’s interpolation formula, allowing a unified presentation of 
various estimates on polynomials. 


The longest and certainly most challenging chapter is the last one. It 
explores several algebraic techniques in combinatorics. The methods are stan- 
dard but powerful. The last part of the chapter deals with applications to 
geometry, presenting some of Dehn’s wonderful ideas. The last problem pre- 
sented in the book is the famous Freiling, Laczkovich, Rinne, Szekeres theorem, 


XIV 


a stunningly beautiful application of algebraic combinatorics. 

We would like to thank, again, the members of the mathlinks site for 
their invaluable contribution in providing solutions to many of the problems 
in this book. Special thanks are due to Richard Stong, who did a remarkable 
job by pointing out many inaccuracies and suggesting numerous alternative 
solutions. We would also like to thank Joshua Nichols-Barrer, Kathy Cordeiro 
and Radu Sorici, who gave the manuscript a readable form and corrected 
several infelicities. Many of the problems and results in this book were used by 
the authors in courses at the AwesomeMath Summer Program, and students’ 
reactions guided us in the process of simplifying or adding more details to the 
discussed problems. We wish to thank them all, for their courage in taking 
and sticking with these courses, as well as for their valuable suggestions. 


Titu Andreescu Gabriel Dospinescu 
titu.andreescu@Q@utdallas.edu gdospi2002@yahoo.com 


Contents 


1 Some Useful Substitutions 
1.1 The relation a? +0?+c?=abe+4................ 
1.2 The relations abc =a+b+c+2 and ab+bc+ca+2abc=1 . 
1.3 The relation a? + b? + c + 2abc=1 ............0.. 
1.4 Notes 


2 Always Cauchy-Schwarz... 
2.1 Notes .. naaa aa 
Addendum 2.A Cauchy-Schwarz in Number Theory ......... 


3 Look at the Exponent 

3.1 Introduction. . . ooa a eee ee 
3.2 Local-global principle 

3.3 Legendre’s formula . . . a aoa a a a 0.0002 eee 
3.4 Problems with combinatorial and valuation-theoretic aspects 
3.5 Lifting exponent lemma . . aaoo 
3.6 p-adic techniques . . . o.oo a 2.00. ee eee eee ee 
3.7 Miscellaneous problems . . ao oaoa a 
3.8 Notes 2... 0... a L 
Addendum 3.A Classical Estimates on Prime Numbers ...... . 
Addendum 3.B An Introduction to p-adic Numbers ......... 


4 Primes and Squares 
4.1 Notes ...........0.2..2..0.2.004 Lo ee 


Oo N e 


27 


29 
61 
62 


91 
91 
92 
96 


. 104 


110 
116 
125 
134 
135 
141 


189 


xvi 


Contents 


T>’s Lemma 
5.1 Notes ..aaoaaa aa 
Addendum 5.A Hölder’s Inequality in Action . ............ 


Some Classical Problems in Extremal Graph Theory 

6.1 Notes .. 2... ee 
Addendum 6.A Some Pearls of Extremal Graph Theory. ...... 
Addendum 6.B Probabilities in Combinatorics ............ 


Complex Combinatorics 

7.1 Tiling and coloring problems ................... 
7.2 Counting problems ..................00 0000. 
7.3 Miscellaneous problems ...................008. 
7.4 Notes 2... 0. 
Addendum 7.A Finite Fourier Analysis ..............2.. 


Formal Series Revisited 

8.1 Counting problems ................00.0000004 
8.2 Proving identities using generating functions .......... 
8.3 Recurrence relations ................0 000000. 
8.4 Additive properties... ........0.0. e a ee eee 
8.5 Miscellaneous problems ...................000. 
8.6 Notes 2... 2.2... 0. a L 


A Little Introduction to Algebraic Number Theory 

9.1 Tools from linear algebra ..................0.4. 
9.2 Cyclotomy........... 2.0.00. 2 eee ee ee 
9.3 The ged triek ........... 2.2.0.0 0000. eee eee, 
9.4 The theorem of symmetric polynomials. ...........2.. 
9.5 Ideal theory and local methods .................. 
9.6 Miscellaneous problems ..................0.004. 
9.7 Notes 2... 2... L 
Addendum 9.A Equations over Finite Fields ...........0.. 
Addendum 9.B A Glimpse of Algebraic Number Theory ...... 


205 
226 
227 


235 
201 
202 
265 


281 
281 
285 
299 
309 
310 


Contents 


10 Arithmetic Properties of Polynomials 
10.1 The a — b| f(a) — f(b) trick 2.2... aaa 
10.2 Derivatives and p-adic Taylor expansions. ............ 
10.3 Hilbert polynomials and Mahler expansions ........... 
10.4 p-adic estimates... .............0 00000080 bbe 
10.5 Miscellaneous problems ....................0..4 


10.6 Notes 


11 Lagrange Interpolation Formula 


11.1 Notes 


12 Higher Algebra in Combinatorics 
12.1 The determinant trick .............0..0.0.0.00. 
12.2 Matrices over Fo .......0...0.02002 0000 eee eee 
12.3 Applications of bilinear algebra... ...........0.0.0.. 
12.4 Matrix equations .............0..0. 00000002 2G 
12.5 The linear independence trick .................0.0. 
12.6 Applications to geometry ..............2. 02.0005 


12.7 Notes 


Bibliography 


xvii 


485 
485 
494 
497 
006 
O13 
O20 


o21 
037 


039 
041 
046 
O02 
061 
068 
076 
083 


585 


Chapter 1 


Some Useful Substitutions 


Let us first recall the classical substitutions that will be used in the fol- 
lowing problems. All of these are discussed in detail in [3], chapter 1 and the 
reader is invited to take a closer look there. 

Consider three positive real numbers a, b,c. If abc = 1, a classical substi- 
tution is 











x y z 
a=—, b=5>5, c=- 
y z x 
A less classical one is 
x y z 
a = ) b = ; C = 
y+z Z+2z xr+y 


(for some positive real numbers z, y, z) whenever ab + bc + ca + 2abc = 1, or 
its equivalent version 


y+2 z+ rt+y 
a= , b= , 
x yY z 











when abc = a + b + c + 2 (the equivalence between the two relations follows 
from the substitution (a,b,c) > (4, ¢,+)). Two other very useful substitutions 
concern the relations a? + b? +c? = abc + 4 and a? + b? +c? + 2abc = 1. In the 
first case, with the extra assumption max(a,b,c) > 2, one can find positive 


numbers z, y,z such that xyz = 1 and 


1 1 1 
a=r+—, b=yt-, c=2z4-. 
x Y z 


2 Chapter 1. Some Useful Substitutions 


In the second case one can find an acute-angled triangle ABC such that 
a = cos( A), b=cos(B), c=cos(C). 


Of course, in practice one often needs to use a mixture of these substitutions 
and to be rather familiar with classical identities and inequalities. But expe- 
rience comes with practice, so let us delve into some exercises and problems 
to see how things really work. 


1.1 The relation a? + b? + c? = abc + 4 


We start with an easy exercise, based on the resolution of a quadratic 
equation. 


1. Prove that if a,b,c > 0 satisfy la? + b? + c? — 4| = abc, then 


(a — 2)(b— 2) + (b — 2)(c — 2) + (c — 2)(a — 2) > 0. 
Titu Andreescu, Gazeta Matematica 


Proof. If max(a, b,c) < 2, then everything is clear, so assume that 
max(a, b,c) > 2. 


Then a? + b? + c* — abc = 4, so there exist positive numbers z, y, z such that 
xyz = 1 and 


1 1 1 
a=xr+-, b=yt-, c=z4-. 
x y z 
But then a,b,c > 2 and we are done again. O 


Proof. The most natural idea is to consider the hypothesis as a quadratic 
equation in a, for instance. It becomes a? + abc + b? + œ — 4 = 0, and solving 


the equation yields 
Fbe + y (b? — 4) (2 — 4) 
a= > 


1.1. The relation a? + b? +c? = abc + 4 3 


Thus (b? — 4) (c? — 4) = (be + 2a)?, which can also be written as 


(bc + 2a)? 
b — 2)\(c — 2) = — © 0. 
( M ) (b+ 2)(c+ 2) -7 
Writing similar expressions for the other two variables, we are done. O 


The following exercise is trickier and one needs some algebraic skills in 
order to solve it. We present two solutions, neither of which is really easy. 


2. Find all triples z, y, z of positive real numbers such that 


T? +Y? + 2z? = ryz+4 
TY tyz+ 20 =2(xr+y+ z2) 


Proof. By the second equation we have max(z, y, z) > 2 and so the first equa- 
tion yields the existence of positive real numbers a, b,c such that 


and abc = 1. R 

1 a 1 

—+-+-]=2 =]. 

` (a+ tat 2) ` (a+ E) 
Since abc = 1, we have 
1 1 
>=) a% J o=o 

so the second equation can be written 

a b 

` (5) = X at) ab. 

The left-hand side is also equal to 


N ea? +b?) = © a) © ab) — 3, 


4 Chapter 1. Some Useful Substitutions 


because , 21 2 

a Oo a” + _ 2 2 

5 ta oh = c(a* + b°). 
We deduce that (}>a—1)(>\ab-—1) = 4. Since ` a > 3 and łab > 3 (by 
the AM-GM inequality and the fact that abc = 1), this can only happen if 


a= b = c = 1l and thus when t =y =z =2. oO 





Proof. If x+y = 2, the second equation yields zy = 4, so that (x — y)? = —12 
which is a contradiction. Thus x + y # 2 and similarly y +z £2, z+2 £2. 
The second equation yields 


4 — TY 


= 2 4—7, 
í try? 


and a rather brutal insertion of this expression in the first equation gives 


A) E 


a2 
(=y) {53 2—TrT—Yy 


Unless x = y = 2, this implies the inequality 2 > x +y. If two of the numbers 
x,y,z are equal to 2, then trivially so is the third one. If not, the previous 
argument shows that 2 > z +y,2 > y +z and 2 > z +z. But then the second 
equation yields 


+z 
ctyte> Da ( H) = Ty + yz + zxr =2(r+y4+2), 
cyc 


a contradiction. Thus, the only solution is x = y = z = 2. o 


The following problem hides under a clever algebraic manipulation a very 
simple AM-GM argument. The inequality is quite strong, as the reader can 
easily see by trying a brute-force approach. 


3. Prove that if a,b,c > 2 satisfy a? + b? + œ = abc + 4, then 


a+b+c+ab+ac+be > ?2y(a+b+c+3)(a? +b +e — 3). 


Marian Tetiva 


1.1. The relation a? + b? +c? = abc + 4 5 


Proof. The hypothesis yields the existence of positive real numbers z, y, z such 


that 
T Y pi ¥ 42 Git ye 


? 


Y T z y zZz T 


The miracle is that both sides of the inequality have very nice factorizations. 
For the right-hand side, this is easy to observe, since 


1 1 1 
a+b+c+3 = (zy +yz +zzr)| — + — + — 
LY YZ ZT 


and 


Ptt- (54%) + (5) (Sa). 


For the left-hand side, things are more subtle, but one finally reaches the 
identity 


a+b+c+tab+bce+ca= (X) (Zi) + (Sa) (So e). 


The desired inequality becomes simply the AM-GM inequality for two num- 
bers! O 


The following problems are rather tricky mixtures of algebraic manipula- 
tions and elementary number theory. 


4. Find all triplets of positive integers (k,l, m) with sum 2002 and for which 


the system 

x 
T Y, 
y 2 
YL z_, 
zZz Y 

x 

z 


has real solutions. 
Titu Andreescu, proposed for IMO 2002 


6 Chapter 1. Some Useful Substitutions 


Proof. The system has solutions if and only if 
k? +1? +m? = lkm 4+ 4. 
An easy computation shows that this relation is equivalent to 
(k +2)(L+2)(m+2) =(k+l+m+2)?. 


As k+l +m = 2002, we deduce that any solution of the problem satisfies 
k+l+m = 2002 and 


(k + 2)(1+2)(m+2) =(k+l+m+2)* = 2004? = 24 . 3? . 1677. 


A simple case analysis shows that the only solutions are k = 1 = 1000,m = 2 
and its permutations. The result follows. oO 


5. Solve in positive integers the equation 


(x +2)(y +2)(z +2) = (z +y +z +2). 


Titu Andreescu 


Proof. A simple algebraic manipulation shows that the equation is equivalent 
to rT? +y? +2? = ryz+4 and, seeing this as a quadratic equation in z, we obtain 
the equivalent form (x? —4)(y2—4) = (xy—2z)*. If z? < 4, then y? < 4 as well 
and so x = y = 1, yielding z = 2. If x = 2, then y = z. In all other cases, we 
can find a positive square-free integer D (which is easily seen to be different 
from 1) and positive integers u,v such that z? — 4 = Du? and y? — 4 = Dv”. 
Thus, solving the problem comes down to solving the generalized Pell equation 
a? — Db? = 4, which is a classical topic: this equation always has nontrivial 
integer solutions and if (ao, bo) is the smallest solution with ap, bọ > 0, then 
all solutions are given by 


(2+2) (s=?) 
m= Cr Ia rs 


1.1. The relation a? + b? + 2 = abc + 4 T 


. O 


ao + bov D E ag — bov D " 
2 2 








Part of the following problem can be dealt in a classical way, but we do 
not know how to solve it entirely without using the trick of substitutions. 


6. The sequence (an)n>o is defined by ap = a; = 97 and 
an+1 = AnAn—1 + 4/ (a2 — 1)(a2_, — 1) 
for all n > 1. Prove that 2+ y2 + 2an is a perfect square for all n. 
Titu Andreescu 
Proof. Writing 
(an+1 — anan-1)" = (a -7 1) (a7 -1 — 1) 
and simplifying this expression yields 
az _, + a? + a2 — 2an@Qn-14n41 = 1, 


thus 

(2an—1)? + (2an)? + (2an41)? — (2an-1)(2an)(2an+1) = 4. 
Since we clearly have a,, > 2 for all n, this implies the existence of a sequence 
Ln > 1 such that 2a, = En + ry! and such that £n+1 = tnXn_1. Thus log £n 
satisfies a Fibonacci-type recursive relation and so we can immediately find 


out the general term of the sequence (an)n. Namely, a small computation 
shows that if we define a = 2+ v3, then zn = aff”, where F, is the nth 


Fibonacci number. Thus 
1 4F, 1 
an = 9 (o + aR) 


1 1 \? 
24 VIF Bag = 2+ (0 + m) = (o5 + 3 
Q n Q n 





and so 








The result follows, since a” +a” € Z for all n, by the binomial formula. O 


8 Chapter 1. Some Useful Substitutions 


Remark 1.1. Here is an alternative proof of the fact that all terms of the 
sequence are integers, without the use of substitutions. The method that we 
will use for this problem appears in many other problems. As we saw in the 
previous solution, the sequence satisfies the recursive relation 


2 2 2 
an1 tan + an1 — 20n41AnGn-1 = 1. 


Writing the same relation for n+ 1 instead of n and subtracting the two yields 
the identity 


2 2 — 
ap+2 — G1 = 2anan+1(an+2 — an-1). 


Note that (an)n is an increasing sequence (this follows trivially by induction 
from the recursive relation), so that we can divide by an+2 — an-ı # 0 in the 
previous relation and get an+2 = 2GnQn41 — an-1. The last relation clearly 
implies that all terms of the sequence are integers (since one can immediately 
check that this is the case with the first three terms of the sequence). Note 
however that it does not seem to follow easily that 2 + v2 + 2a, is a perfect 
square using this method. 


Remark 1.2. There are a lot of examples of very complicated recurrence 
relations that rather unexpectedly yield integers. For instance, the reader 
can try to prove the following result concerning Somos-5 sequences: let 
a} = @ =:::=as = 1 and let 


An+1ün+4 + An+424n+3 
An 


Qn+5 = 


for n > 0. Then a, is an integer for all n. Similarly one defines Somos-6, 
Somos-7, etc sequences by the formulas 
An+14n+5 + An4+2An+4 + a2 45 


ag =a, =---=a5=— 1, n+ = = a 
n 


_ _ An+1On4+6 + An4+24n+4+5 + An+3An44 
ao =a, =-::=a,=1, an47 = —O, _ Aa, 
an 
etc. One can prove (though this is not easy) that all terms of Somos-6 and 
Somos-7 sequences are integers. Surprisingly, this fails for Somos-8 sequences 


(in which case a7 is no longer an integer!). 


1.2. The relations abc =a+b+c4+2 and ab + bc + ca + 2abc = 1 9 


1.2 The relations abc =a+6+c+4+2 and 
ab + bc + ca + 2abc = 1 


The first inequality in the following problem is very useful in practice and 
we will meet it very often in the following problems. 


7. Prove that if x,y,z > 0 and zyz = xz +y + z +2, then 
3 
ty +yz + zxr > 2(x+y+ 2) and Vz +y + vz < g V TY. 


Proof. With the usual substitutions 


b+c c+a a+b 
T = = 











a’ b c 
the first inequality comes down (after clearing denominators and canceling out 
similar terms) to Schur’s inequality 


a(a — b)(a—c) + b(b—a)(b—c) + c(c — a) (c — b) > 0, 


while the second one follows. by adding up the inequalities 


pl _ | a . b <i a 4 b m 
TY a+c b+c` 2\c+a b+c 


The reader may find a bit strange the first method of proof of the following 
problem, but it is actually a quite powerful one. We will use again this kind 
of argument, see problem 11 for instance. Also, the third solution uses a very 
useful technique. 











8. Let x,y,z > 0 be such that ry + yz + zx = 2(x +y + z). Prove that 


ryz lr+y+tz+2. 


Gabriel Dospinescu, Mircea Lascu 


10 Chapter 1. Some Useful Substitutions 


Proof. We will argue by contradiction, assuming that ryz > r+y+z+2. We 
claim that we can find 0 < r < 1 such that X = rz,Y = ry,Z = rz satisfy 
XYZ=X+Y+2Z+4+2. Indeed, this comes down to the vanishing of 


f(r) =r ryz —r(at+y+z)—-2 


between 0 and 1, and this is clear, since f(0) < 0 and f(1) > 0. Next, the 
condition ry + yz + zx = 2(x + y + z) yields 


XY+YZ4+2Z2X =2r(X+Y+4+Z)<2(X4+Y4+2Z). 
This contradicts the first inequality of problem 7. O 
Proof. The condition can also be rewritten in the form 
(x = 1)(y—1) + (y-1)(z-1) + (2-1)(@-1) =3 
or in the form 
ryz—-x-y-—z—-l=(r4-1)\(y-1)(z-1). 


We will discuss several cases. If x,y,z > 1, then by the AM-GM inequality 
and the first identity, we get 


1 > V(e—1)4(y— 1)2(z — 1), 


which yields, thanks to the second identity, the desired estimate. 

If x,y,z <1 or if only one of the numbers z, y, z is smaller than or equal 
to 1, then (x — 1)(y — 1)(z — 1) < 0 and so xyz < z +y +z +1 in this case. 
Finally, if two of the numbers are smaller than 1, say x,y < 1, the desired 
inequality can be written in the form 0 < x + y+ 2(1 — zy) + 2, which is 
obvious. O 


Proof. For three positive real numbers z, y, z consider fixing the first two el- 
ementary symmetric polynomials o} = x + y + z and o2 = zty + yz + zz and 
letting o3 = xyz vary. This amounts to varying only the constant term in the 
polynomial 


p(t) = È — ot? + oot — 03 = (t — x)(t — y) (t — 2) 


1.2. The relations abe =a+60+c4+2 andab+ bc + ca + 2abc = 1 11 


and defining z, y,z to be the three roots of this polynomial (in some order). 
Increasing 03, i.e. lowering the constant term, corresponds geometrically to 
lowering the graph. As we lower the graph, the smallest root increases, thus 
we maintain three positive real roots until the smallest root becomes a double 
root. If the double root is at t = a and the larger root at t = b, then we have 
Oo, = 2a + b, oo = a? + 2ab and o3 = a°b. 

If we fix a; and o2 with o2 = 20, as hypothesized, then we find b = ots 
and because 0 < a < b, we see that 1 < a < 2. By the discussion above 
ryz = 03 < ab, so it suffices to show that 





— aja’ (4 —a)a 
b = ——— < 2a+64+2=2a+4+ ——— 
= an) SOTO A= POF oa 


+2 
for 1 <a < 2. But this rearranges to (a — 2)? (a? — 1) > 0 and we are done. O 


The technique used in the first solution of the next problem is rather ver- 
satile and the reader is invited to read the addendum 5.A for more examples. 


9. Let x,y,z > 0 be such that ry + yz + zxz + xyz = 4. Prove that 


1 1 1 \? 
(z+ z5 +=) > (x + 2)(y + 2)(z + 2). 


Gabriel Dospinescu 


Proof. Using the usual substitution 


2a 2b 2c 


~ b4e° Y= ota’ a+b’ 








the problem reduces to proving the inequality 


b+c (a+b+c)’ 
ips DE CETTE 





12 Chapter 1. Some Useful Substitutions 


This is a quite strong inequality and it is easy to convince oneself that 
most applications of classical techniques fail. However, the following smart 
application of Holder’s inequality does the job: 


(x =) (X abt?) > (Xo+ 0)) 


so it is enough to prove that 
3] [(at+5) 22$ a+c}. 


This reduces after expanding to Y` a(b — c)? > 0, which is clear. O 





Proof. First, we get rid of those nasty square roots, via the substitution 


ry =4c?, yz=4a7, zx = 4b. 
Then 
2bc 2ca 2ab 
TtT = —, y = ——, = — 
a b c 


and replacing these values in the inequality yields the equivalent form 
3(a +b+ c)? > 16(a + bc)(b + ca)(c + ab). 


The hypothesis becomes a? + b? + c? + 2abc = 1, so that there exists an acute- 
angled triangle ABC such that a = cos A, b = cos B, c= cosC. Next, 
observe that 


c + ab = cos Č + cos A ; cos B = — cos( A + B) + cos A - cos B = sin A - sin B. 


Using this (and similar identities obtained by permuting the variables), the 
desired inequality becomes 


v3 X cos A > 4 | [sin A. 


Using the well-known identities 


s= ART] cos £. r=4R[[sn $, X cos A = +s, 


1.2. The relations abe =a+b6+c4+2 and ab + bc + ca + 2abc = 1 13 


the inequality becomes 


r+R_ 2rs 

3—— > —, 

v3 R ` RP’ 
or (R+r)Rv3 > 2rs. This splits trivially into R+r > 3r (Euler’s inequality) 
and s < 3V3 R, the last one being well-known and easy. O 


Here is yet another easy application of problem 7. 


10. Let u,v, w > 0 be positive real numbers such that 


u +v +w +t yuvw = 4. 


UU UW WU 
4] — +4/— +4/— 2utotu. 
W U U 


China TST 2007 


Prove that 


Proof. To get rid of those nasty square roots, let us perform the substitution 


UU UW UW 
—=c, ,/—=a, ,/—=b. 
w u U 


Then u = bc,v = ca and w = ab, so that the inequality becomes a + b + c > 
ab+bc+ ca and the hypothesis is ab + bc + ca + abc = 4. Another substitution 


yields ryz = x+y+z+2. We need to prove that ry +yz + zz > 2(x+y+2), 
which is the first component of problem 7. O 


The following problem has some common points with problem 7 and one 
can actually deduce it from that result. But the proof is not formal. 


1 1 1 
11. Prove that if a,b,c > 0 and T =a +7, Y50447, z = c + —, then 
c a 


LY +yz +zr > 2(x+y+ 2). 


Vasile Cârtoaje 


14 Chapter 1. Some Useful Substitutions 


Proof. The method is the same as the one used in problem 8. The starting 
point is the observation that ryz > 2+x +y + z. Indeed, 


nyz = abe + —- tatb+c+o 424 >24¢a04ytz. 

abc a b c 
Let us assume for sake of contradiction that ry + yz + zx < 2(x +y + z) and 
let us choose r € (0,1] such that X = rz,Y = ry,Z = rz satisfy XYZ = 
24+ X +Y +Z. This equality is equivalent to r°xyz = 2+r(x+y+ z) and such 
r exists by continuity of the function f(r) = r?xyz—2—r(x+y+z) and by the 
fact that f(1) > 0 and f(0) < 0. The hypothesis ry + yz + zx < 2(x +y +2) 
can also be written as 


XY+YZ+ZX < ar(X+Y4Z)<2X4+Y4+2Z). 
This contradicts the first inequality in problem 7. O 


Proof. As in the previous proof, we obtain that ryz > r+y+2+4+2. Since we 
obviously have min(ry, yz, zz) > 1, this implies that z > ae and similar 
inequalities obtained by permuting the variables. Next, we have 


1 1 1 
r+y+tz=a+-+b+-+c+->6. 
a b c 
In particular, there are two numbers, say x,y, such that x+y > 4. The 
inequality to be proved is equivalent to z > ety ay, so we are done if we 
can prove that 
2+xr+y ` 2(£ +y) — Ty 
ry-l ~— zrt+y-2 — 
But this is equivalent, after an easy computation (in which it is convenient 
to denote S = x + y and P = zy), to (ry — y — x)? > (x — 2)(y — 2). If 
(x — 2)(y — 2) < 0, this is clear; otherwise z,y > 2 as x+y > 4. Let 
u = z — 2,v = y — 2, then the inequality becomes (uv + u + v)? > uv, with 
u,v > 0 and it is obvious. O 


The form of the following inequality strongly suggests the use of the 
Cauchy-Schwarz inequality. It turns out that this approach works, but in 
a rather indirect and mysterious way, which makes the problem rather hard. 


1.2. The relations abe =a +b+c+2 and ab+be+ca+ 2abc = 1 15 


12. Prove that for all a,b,c > 0, 


(b+c—a)? (ce+a-—b)?  (a+b-c)? , 3 
(b+c)? +a? (c+a)}? +b (a+b)? +c? ~ 5 
Japan 1997 


Proof. With the usual substitution 








’ ’ ’ 


b+e a+c a+b 
r = — 
a b c 


the problem asks to prove that for any positive numbers z,y,z such that 
ryz=x+y+z+2 we have 
xz — 1)? —1) z—1) 
@-1? y= 1P E-D 
r? +1 2+1 2+1 








3 
>. 
Oo 


Applying the Cauchy-Schwarz inequality (or what is also called Titu’s lemma), 
we obtain the bound 


(z-1? (y=1)? | (2-1) 5 (e@ ty +2-3) 


xr? +1 + y2 +1 + z2 +1 T r? +y +z+3 








It remains to prove that the last quantity is at least s, Let S = x +y+z and 
P = zy + yz + zz, so that the inequality becomes 


(S — 3)? 3 
ee See 
S*—2P4+3 7 5 
Notice that as $ +2 > 2 and cyclic permutations of this, we have S > 6. Also, 
by problem 7 we have P > 2S. Therefore, 


_ Q)\2 — Q\2 — 
(8-3? (9-3? _S-3_, 2 , 


en ee > — , o 
S2-2P4+3 7 &2-4S+3 S-1 S-i 


| Go 


Remark 1.3. There are other solutions for this problem: some of them are 
shorter, but not easy to find. Here is a particularly elegant one, based on 


16 Chapter 1. Some Useful Substitutions 


the linearization technique: the inequality is homogeneous, so we may assume 
that a+b+ c= 1. We need to prove that 


y (1 — 2a)? , 3 
(1-a) +a? 7 5 
cyc 

(1—2a)? 


The point is to bound from below Oaa by an affine function of a, suitably 
chosen. The best choice is the following 


(1 — 2a)’ > 9 _ 34a (ge a++ >0 
(l—a)*+a*~ 75 25 3 6) `~ 
Of course, the question is how one came up with something like this. Well, it 
is actually very easy: we need constants A, B such that 


(1 — 2a)? 
(nate ate > Aa+ B 


for all a € [0,1], with equality for a = L, Imposing also the vanishing of the 
derivative of the difference between the left and right-hand side at 1 yields the 
desired constants A, B. Once we have the previous estimates, it is very easy 
to conclude by adding them and taking into account that a+b+c= 1. 


13. Find all real numbers k with the following property: for all positive 
numbers a,b,c the following inequality holds 


a b c 1\ 8 
—— — |>{fe+_—). 
(b+ 52) (e+=) (+5) -— (+3) 


Vietnam TST 2009 





Proof. Take first of all a = b = 1 and arbitrary c to get that 


CSICPHGHE 


1.2. The relations abe =a +b+c+2 and ab + bc + ca + 2abc = 1 17 


Letting c > 0, we deduce that any k as in the statement must satisfy 
1\3 
k(k +1)? > (s+ 5) 


which is equivalent to 4k? + 2k > 1. We claim that any such k is a solution of 
the problem. 
Pick such k and perform the usual substitution 


a b c 
p = zZz = 
b+e’ y 











cta’ a+b 


to reduce the problem to 
1\3 
Sat y +2) + hay + ye + ex) +yz > (k+ 5) . 


Now, we know that ry + yz + zx + 2xyz = 1 and it is by now a classical fact 
(use problem 7 for 1/z,1/y,1/z) that x +y +z > 2(xy +yz + zxz). Thus, it 
is enough to ensure that 


1- 1\° 
KS (2K + bay + ye + za) + EEUE) > ( . 


k+ 
2 + 2 
This can be rewritten as 


1 
(2? 8-5) (xy tyr + 2-4) > 0. 


But the last inequality follows from the fact that 2k? + k — 5 > 0 and 


3 
ry + yz +20 2 7, 


which, by returning to the substitution 
a b c 
L = ——, = ) Zz = —, 
b+c I c+a a+b 
is equivalent to ` a(b — c)? > 0. 
In conclusion, the solutions of the problem are the real numbers k such 
that 4k? + 2k > 1. o 





18 Chapter 1. Some Useful Substitutions 


We end this section with a very challenging problem, which answers the 
following natural question: what would be the analogue of the classical sub- 
stitution z = 22,y = SS z= a+ when we have five variables? We warn the 


reader that the first solution i is really not natural.. 


14. Let aj, a9,...,a5 be positive real numbers such that 
a,a2...a5 = a1 (1 + az) +a(l + a3) +- +as(l + aı) +2. 


What is the least possible value of + -+4 -Hee +? 


Gabriel Dospinescu, Mathematical Reflections 
Proof. Consider the linear system 
T2 + T3 = Q11, T3 + T4 = Q2T2, T4 + T5 = A373, 


£5 + £1 = A4%4, Tı + T2 = QA5T5. 


We can try to solve this system by expressing, for example, all variables in 
terms of z1,x5. Namely, we can use the fifth, first and second equation to 
express £2, £3, £4 in terms of 21,75. Replacing the obtained values in the 
other two equations and eliminating x1, x5 between them, we obtain that the 
system has a nontrivial solution if and only if 


5 
IC = 2 + ` a; + aja4 + a4a2 + a2a5 + a5a3 + a3a1. 
1=1 i=] 


All the previous (painful!) computations are left to the reader, since they 
are far from having anything conceptual. Note that the same result can be 
obtained by computing the determinant of the associated matrix. Now, the 
previous relation is almost the one given in the statement, up to a permutation 
of the variables. So, since the conclusion is symmetric in the five variables, we 
will assume from now on that the previous relation is satisfied instead of the 
one given in the statement. 


1.2. The relations abe =a +b+c+2 and ab+bce+ca+2abc = 1 19 


The crucial claim is that under the previous hypothesis, the system has 
solutions whose unknowns z; are positive. Probably the easiest way to prove 
this is to exhibit such a solution. Well, simply take zı = 1, 


— 1+a,+a2 + a3 + a143 


5 = 
1 + a5 + a3a5 + azas 
and the define 
1+ 25 La + T5 r3+ T4 
r4 = ——, T3 = ——,,_ T} = ———_. 
Q4 Q3 a2 


The question is, of course, how on earth did we choose the value of x5? Well, 
simply by solving the system, as indicated in the beginning of the solution. 
Note that we clearly have xz; > 0 and an easy computation shows that these 
are solutions of the system (we only have to check two equations, since three 
are satisfied by construction). 

The conclusion is that if the a;’s satisfy the condition 


5 5 
[| = 2 + Sai + a,a4 + agag + agas5 + a5a3 + a341, 


then we can find z; > 0 such that 


T2 + T3 £3 + T4 T1 + 2X 
ay = ——,_ a = —— _."..... , a, = ———.. 
Tı T2 T5 
So, the problem reduces to finding the minimal value of 
T1 T2 T5 


+ ++ , 
T2 +£3 T3 + T4 Tı + XQ 











We claim that the previous expression is always at least 5/2. By Titu’s lemma, 
this reduces to proving that 


5 
(£1 + z2 + £3 + T4 + T5) > = dts 


2X riz; = (Sra) B Nori, 


But since 


20 Chapter 1. Some Useful Substitutions 


this reduces to (Y` ai)” < 55> 2?, which is Cauchy-Schwarz. 
Putting everything together, we obtain that the minimal value of 


is 5/2, attained when all a;’s are equal to 2. Who wants to play now the same 
game with 7 variables? O 


Proof. Here is another proof, also far from being evident... We will use the 
following nontrivial 


Lemma 1.4. For all nonnegative real numbers £1, £2,...,£5 we have 
(£i +zr2 +--+ z5)’ > 25(L1L2L3 + T234 +--+ + 252122). 


Proof. This is not easy at all, but here is a very elegant (but unnatural) proof: 
consider the identity 


£1 29L3 + LToL3L4 + T3L4T5 + T4L5L] + L5L1L2 
= z5(£1 + £3)(£2 + £4) + T2£3(£1 + T4 — Zs). 
We may assume that z5 = min zq; (since the inequality is cyclic), so that 


zı + z4 — z5 > 0. Denoting zı + 72 + £3 + 274 = 4t and using the AM-GM 
inequality, we deduce that 


4t — r \ À’ 
£5(£1 + £3)(£2 + £4) + T2£3(£1 + T4 — z5) < At? £5 + (555) , 


Thus, it remains to prove that 


— 3 , 3 
At? rs 4 4t — 25 < (x5 + 4t)” 
3 20 





By homogeneity, we may assume that r5 = 1 and then expanding everything 
the inequality becomes, with the substitution 4t—1 = 3x, (x—1)*(8x+7) > 0, 
which is clear. o 


1.3. The relation a? + b? + c? + 2abc = 1 21 


Using this lemma for the inverses of the a;’s and denoting 


1 1 1 
S=—+—4+---4+4— 
ay a2 Qs,’ 
we deduce that 
s? > 25 5 > ———_ = 95 2a HOH 
AjA7;+4+14i+2 a1Q9°°°a5 


On the other hand, by Mac-Laurin’s and AM-GM inequalities we also have 


5° 


S> —> ~ g4 > jog t02 t:i ta 


~ Q@1Q2''*: 05 ajag:::as 


Taking into account these inequalities and the hypothesis, we deduce that 


St © 2S5 
tati 
which immediately implies that S > >, O 


1.3 The relation a? + b? +c’? + 2abc = 1 


The following problem is a rather tricky application of the AM-GM in- 
equality. 


15. Prove that in all acute-angled triangles the following inequality holds 


2 2 2 
B C 
(=5) (= ) (= + 8cosAcos B cos C > 4. 











cos B cos Č cos A 
Titu Andreescu, MOSP 2000 


Proof. Since 


cos? A + cos? B + cos? C + 2cos A cos BcosC = 1, 


22 Chapter 1. Some Useful Substitutions 


the inequality can be written 











cos A\? cos B\? cos C\? 
>A4 2 A 2 B 2 o). 
(55) + (5) + (=) > 4(cos* A + cos* B + cos* C) 


Let 
zr = cos? A, y= cos B, z= cos? C, 
so we need to prove that 
x z 
4242 >4(£+y+2). 
Y z T 
The point is that we actually have 


x zZ r+tyt2 
zT, Y, zy utytz 


Y 2 xr Sxryz 


for any positive real numbers x,y,z, as follows by adding up the AM-GM 


inequalities 
2 
aT Poe 
y z yz 


Thus, it is enough to prove that ryz < oe But this is well-known and 
easy to prove. L 


The following problem is a preparation for a hard problem to come, but 
has also an independent interest. 


16. Prove that in every acute-angled triangle ABC, 


(cos A + cos B)? + (cos B + cos C)? + (cos C + cos A)? < 3. 


Proof. Letting 


1 1 1 
cos Á = 4/—, cosB= 4/—, cosC = ,/— 
V yz zx V zy 


1.3. The relation a? + b? +c? + 2abc = 1 23 


for some positive real x,y,z, we obtain the relation ryz = x + y + z + 2 from 
the classical identity Y` cos? A + 2 [] cos A = 1. Thus we can find positive real 
numbers a, b,c such that 

b+c c+a a+b 


b] = b] Z = 
a y b c 











The desired inequality can be written 
VLE eea rS ya < eyz 
TY Za /TY 4= 5 I 
"DDR (z +y+z +2) <4 (Eve) <2(% 1+3). 


This follows from Cauchy-Schwarz 


(vz) < (Zeto) (Zi) -2522+86 gO 


Proof. Since the triangle is acute, there are positive numbers x,y,z such that 





a=yt+z, V=z+r, C=2r4+y. 


Then 
x 


(c+ y)(x4 +z) 


and similar identities for cos B,cos C. The desired inequality can be written 
as 


cos A = 


ho | Go 


yy lk 
(x + y)(@ +2) (y +z)y (x +yz +z) — 
which (after clearing denominators) is equivalent to 
3 
(Pw +2) +yz VE FEE) < ZE Hylyt) + a) 
and then to 


1 
X Vuela +y): yz(z +2) < Bayz + 5 > yzy +2). 
cyc cyc 


24 Chapter 1. Some Useful Substitutions 


However, this follows immediately from the AM-GM inequality: 
1 
yz(x +y): yz(x +z) < ryz + zY2(Y + z) 


and two similar inequalities. oO 


Even though the following problem seems classical, it is actually rather 
hard. Fortunately, we did the difficult job in the previous problem. We 
also present an independent and very elegant approach due to Oaz Nair and 
Richard Stong. 


17. Prove that if a,b,c > 0 satisfy a? + b? + œ + abc = 4, then 


0 < ab + bc + ca — abc < 2. 


Titu Andreescu, USAMO 2001 


Proof. The inequality on the left is easy: the hypothesis and the AM-GM 
inequality imply that abc < 1, so that min(a, b,c) < 1. We may assume that 
c = min(a, b,c), so that 


ab + bc + ca — abc = c(a + b) + ab(1 — c) > 0. 


The hard point is proving that ab+bc+ca-— abc < 2. Taking into account 
the hypothesis, this can also be written as 


a+b\? b+e\? a+tcy\? 
< 3. 
(SS) +S) (SHS) 33 


If we denote a = 22,b = 2y,c = 2z then r? + y? + z2? + 2ryz = 1 and so 
there exists a triangle ABC such that x = cos(A).y = cos(B),z = cos(C). 
Thus, the problem reduces to the previous one (since a,b,c are nonnegative, 
the triangle ABC is acute angled). o 











Proof. Here is a very elegant proof for the hard part of the inequality. Among 
the three numbers a— 1, b—1,c—1, two have the same sign, say b— 1 and c- 1. 
Thus (1 — b)(1 — c) > 0, implying that b+c < bc + 1 and ab + ac — abc < a. 


1.3. The relation a? + b? +c? + 2abc = 1 25 


Thus, it is enough to prove that a+ bc < 2. But the given condition and the 
fact that a is nonnegative imply that 


1K —be + ,/(b* — 4)(c? — 4) 
=. 


Thus, it is enough to prove that ,/(b? — 4)(c* — 4) < 4— bc. Note that be < 4, 
since b? < 4 and c? < 4. Squaring the previous inequality thus yields an 
equivalent one 

be? — 4b? — 4c* + 16 < b*c? — 8bc + 16, 


which is obvious. O 


We end this chapter with another challenging problem, which reduces after 
some tricky algebraic manipulations to the infamous “Iran 1996 inequality.” 


18. Prove that in all acute-angled triangles the following inequality holds 


212 2 2 2 2 
a“b a“c btc 9 


Nguyen Son Ha 


Proof. If A,B,C are the angles of the triangle, the sine law and the identity 
sin? z + cos? x = 1 yield the equivalent form of the inequality 
5 (1 — cos? A) (1 — cos? B) ` 9 
1 — cos? C T4 
Write 
cos? A = yz, cos? B= zz, cos?C = zy. 


Then zy + yz + zx + 2xyz = 1 and so there exist positive numbers X,Y, Z 
such that 











We want to prove that 


26 Chapter 1. Some Useful Substitutions 


On the other hand, we have 


X(X+Y +Z) 


1-95 EVR FZ) 


so that the inequality is equivalent to 


XY(X +Y +Z) , 9 
ZX+Y)* ~4 


This can also be written in the form 


1 


XYZ(X +Y +2)-) ys ayp 


>? 
T4 


and it is a consequence of the following famous inequality 
Lemma 1.5. For all positive numbers a,b,c we have 
1 1 1 9 
b+b —; + nr ta]. 
(ab + bc + ca) (my (cha)? aay) =] 
Proof. We may assume that a > b > c. First, we show that 


1 1 1 1 2 
— > — c, 
atb (ate? (+e Jab atalbto 








This can be rewritten 








1 1 \*_ (a-b? 
a+c b+c) ~ 4ab(a +b)?’ 
or equivalently 4ab(a +b)? > (a+c)?(b+c)*. This is clear, as (a+b)? > (a+c)? 


and 4ab > (b + c)?. 
Thus, it remains to prove that 


1 2 9 
(ab + bc + ca) at ats > 1 


1.4. Notes 27 


Using the identities 


ab+be+ca 1 c(a+b) 2(ab+bc+ca) 2c? 


4ab 4 dab’ (at+c)\(b+c) ^ (atc)(b+c)’ 


this becomes 
cla +b) 2c? 
4ab  (a+c)(b+c) 


But this is simply the standard inequality (a + b)(a + c)(b + c) > 8abc and we 


are done. g 
This completes the solution to the problem. oO 
1.4 Notes 


We would like to thank the following people for their solutions: Vo Quoc 
Ba Can (problems 13, 18), Xiangyi Huang (problems 4, 5), Logeswaran La- 
janugen (problem 1, 9), Oaz Nair (problem 17), Dusan Sobot (problems 2, 9, 
16), Richard Stong (problems 8, 17), Gjergji Zaimi (problems 2, 3, 6, 7, 8, 11, 
12, 15, 16, 17). 


Chapter 2 


Always Cauchy-Schwarz... 


As the title suggests, all problems in this chapter can be solved using the 
Cauchy-Schwarz inequality, even though sometimes this will require quite a 
lot of work. Let us recall the statement of the Cauchy-Schwarz inequality: if 
Q1,@2,...,Q@p, and bj, b9,...,b, are real numbers, then 


(a? tas +--+ a2). (b? +02 +--- +62) > (aibi +azb2 +--+ + anbn)’. 


This follows easily from the fact that 5~"_,(a;2 + bi)? > 0 for all real numbers 
x. Indeed, the left-hand side is a quadratic function of x with nonnegative 
values, so its discriminant is negative or 0. But this is precisely the content of 
the Cauchy-Schwarz inequality. Another proof is based on Lagrange’s identity 


n n n 2 
(Sa) (50) - (Ee) - I ee -an 


1<i<j <N 


This useful identity will be used several times in this chapter. 

As the best way to get familiar with this inequality is via a lot of examples 
of all levels of difficulty, we will not insist on any theoretical aspects and go 
directly to battle. We start with two easy examples, destined to give the 
reader some confidence. He will surely need it for the more difficult problems 
to come... 


30 Chapter 2. Always Cauchy-Schwarz. .. 


1. Let a,b,c be nonnegative real numbers. Prove that 
(az? + bx + c)(cxz? + bz +a) > (a+b +0)*x? 


for all nonnegative real numbers z. 


Titu Andreescu, Gazeta Matematica 


Proof. This is just a matter of re-arranging terms and applying Cauchy- 
Schwarz: 


(ax? + br + c)(cx? + bz +a) = (ax? + br + c)(a + bx + cx”) 
> (ax + br + cx)? 
=(at+b+c)*z’. 


O 


2. Let p be a polynomial with positive real coefficients. Prove that if 


1 1 
p G > —— is true for x = 1, then it is true for all x > 0. 
zj) ~ p(x) 


Titu Andreescu, Revista Matematica Timişoara 


Proof. Write p(X) = ao + a,X +:::+anX” and observe that 


l a a 
pap (7) = (ao +aız +: + anr”) (ap +2) 


> (ao + ay +--+ +a)? 
= p(1)’. 
The result follows. O 


The following exercise is already a bit trickier, due to the lack of symmetry. 


3. Prove that for all real numbers a,b,c > 1 the following inequality holds: 


Va—-1+vb-1+vVc-1< Valbc+1). 


31 


Proof. The inequality being symmetric in b,c, but not in a, it is natural to 
deal first with vb — 1 + vc — 1. This is easy to bound using Cauchy-Schwarz: 


Vb—1+Ve-1< V(b—141)(14+c— 1) = vbe. 
So, it is enough to prove that Vbc + /a—1 < „/a(bc + 1). But this is once 


more the Cauchy-Schwarz inequality. O 


Another easy, but a bit exotic application of the Cauchy-Schwarz inequal- 
ity is the following Chinese olympiad problem. 


4. Let n be a positive integer. Find the number of ordered n-tuples of 
integers (a1, a2,...,an) such that 


ai tas +: +an >n? and a? +az+---+a% <n? +l. 
China 2002 


proof By the Cauchy-Schwarz inequality we have 


ai +a +: +an < Yn(n3 41) <n? +1. 


Since a; are integers and aj + a2 +: + an > n?, we must have 


ai +a +: + an =n’. 


But then (again by Cauchy-Schwarz) we have a? + a3 +---+a2 > n3, forcing 
a? +az+---t+a2 E {n?,n° +1}. Ifa? +as+---+a2 = n’, then we must have 
equality in Cauchy-Schwarz, implying that all a;’s are equal to n. Assume 
now that a? +a2+---+a?2 =n? +1 and let b; =a; — n. Then 


bo Hb +--+ 2 =n +1-2Qn-n? 4+ n° = 1, 


forcing all but onc b; vanish. This is however impossible, as bı +b9+---+b, = 0. 
Therefore, this second case will not occur and the only solution is 


Q) =a. =`: = An En. O 


We continue the series of easy exercises: 


32 Chapter 2. Always Cauchy-Schwarz. .. 


5. Let 21,22,...,219 be real numbers between 0 and 5 such that 


2 zı + sin? rə H+ sin? xio = 1. 


sin 
Prove that 


3(sin xı + sin z2 +--+- + sin xio) < cos £1 + COS £2 +--+ + COS T10. 


Saint Petersburg 2001 


Proof. The Cauchy-Schwarz inequality and the hypothesis yield 





cos zı = V sin? r + sin? r3 +- + sin? X10 


1 
> 3 (sin ro + sin z3 +--+ + sin x10) 
and similarly for the other variables. The result follows by adding up these 
inequalities. O 


The following exercise combines an easy application of the Cauchy- 
Schwarz inequality with some classical formulae from geometry. Recall that r 
is the inradius and that s is the semi-perimeter of a triangle. 


6. The triangle ABC satisfies 


ot 2 an 2cot Ë a 3 cot Č e Os 
aaa) 2 oO) Tr) ` 


Show that ABC is similar to a triangle whose sides are integers, and 
find the smallest set of such integers. 


Titu Andreescu, USAMO 2002 


Proof. Let the incircle be tangent to the sides of the triangle at points which 
split the sides into segments of length x and y, z and z, and y and z. Thus, 
the sides of the triangle have lengths xz +y, r +2, Yy +z. 


33 


The equation 


ott) + (reor2) + score) = (8%) 
CO Cot OD) TT 


can also be rewritten as 


z? 2 r? 6(x+y+2z 2 
= +44 +05, = (“| | 
r r r Tr 


This simplifies to 


49(2? + 4y? + 9x?) = 36(2 + y + 2}°. 


Now, by Cauchy-Schwarz we have 


1 1 
+545] (2? + 4y? + 927) > (a +y 42)’, 


yielding 
49(2? + 4y? + 9x?) > 36(2 + y +z). 


Thus, we are in the equality case of the Cauchy-Schwarz inequality, which is 
precisely the case when there is a k > 0 such that x = 4h, 2 = 36k,y = 9k. 
Then the sides of the triangle are 13k,40k, 45k. Thus the triangle is similar 
to the triangle with sidelengths 13, 40,45, and the problem is solved. O 


The statement of the following problem looks rather classical. There are 
however some technical problems which make the problem more difficult than 
expected. 


7. Let n > 2 be an even integer. We consider all polynomials of the form 
T” + an_yx" 1 +---+ax +1, with real coefficients and having at least 
one real zero. Determine the least possible value of a? +.a2+---+a?_,. 


Czech-Polish-Slovak Competition 2002 


34 Chapter 2. Always Cauchy-Schwarz. .. 


Proof. Suppose that the corresponding polynomial has a real zero xz. Using 
Cauchy-Schwarz, we obtain 


(x” + 1)? = (a,x + aoz? +H... + An—1x" 1)? 
< (a? +a? +- +a? (2? tat+---+ a2"), 


Thus (a 2 
2, 2 2 z +l — 
ai ta5s+-:-+ar_,> 224 r44... 4 r21) = f(x) 
and so we need to find first the minimal value of f. Now, looking at the 
zeros of the derivative of f suggests that the minimal value might be taken at 
x = 1, so that it equals A. This is not easy to prove using derivatives, as 
the computations are a bit nasty. Instead, we will prove in an elementary way 


that 
n— 1 


4 
In order to prove this, note that we have the inequalities 





(2™ +1)? >L HiH Hre., 


L? +1 > +T? rlata., et 41> 27427. 


Adding them shows that 
n -— 2 
4 


On the other hand, multiplying the last inequality by x” and adding the result 
to the previous inequality gives the inequality 


(2° +1) >a? 4+at+--- +2", 





Liar) > pines HT part Ho eD, 


Thus it remains to prove that (x” + 1)? > 4x", which is clear. 
The previous paragraph shows that 


4 
n— 1 





ai +az+-+-+a%_,> 


if c” +an,—)2" 1+---+a,;xr+1 has at least one real zero. On the other hand, 
choosing a] = a2 = ::: = Qn-1 = -2 shows that -4 is optimal. O 


30 


The denominators in the following inequality look awful, but a clever 
application of the Cauchy-Schwarz inequality can make all of them equal. 
The method of proof is worth remembering, since it appears quite often. 


8. Prove that for any positive real numbers x,y,z such that xyz > 1 the 
following inequality holds 
“4 — 4 — + «1 
r3 +y +z yz +r z +r +y 
Tuan Le, KöMaL magazine 


Proof. Using Cauchy-Schwarz, we obtain 
3, ,2 l 2 
(x° +y +2) (i +142) >(x+y+z)’. 


Thus 
x et (2 +1+ z) 
+y tz (etytz)? 
Writing down similar inequalities for the second and third terms of the left- 
hand side, we reduce the problem to proving that 


S (142422) <(r+y+z) <> Soa? t+) sy>34+5 oa. 
This is immediate from 
So zy > 3 (xyz)? > 3 


and 


2 
Yor > EAU > yapatyts) >etyts oO 


Here is a rather nice-looking inequality taken from a Romanian Team 
Selection Test. There are plenty of ways to prove it, but what follows is 
particularly elegant and natural. 


36 Chapter 2. Always Cauchy-Schwarz. .. 


9. Let n > 2 and let aj, a9,...,@y and 01, bo,...,b, be real numbers such 
that 
a? +a +- +a =b? +024---4+6% =1 


and a,b, + aobo + +--+ a,b, = 0. Prove that 


(ai tag +: Han) + (bi tbo +++: +bn)? <n. 


Cezar and Tudorel Lupu, Romanian TST 2007 


Proof. The proof mimics the proof of Cauchy-Schwarz: take any real number 
x and apply the Cauchy-Schwarz inequality to obtain 


n 2 
n 
i=l 


By hypothesis, the left-hand side equals 1 + x2. Therefore 


(Eur 3h) sn < n(x? +1) 


for any real number x. The difference between the right-hand side and the 
left-hand side being a quadratic polynomial which takes nonnegative values 
on the whole real line, its discriminant has to be negative or zero, thus 


4A? B? < 4(n — A*)(n — B’). 


where A = $ ;_ a; and B = J`; bj. It is immediate to check that this is 
equivalent to the desired inequality. O 
The following problem is easy, but the lack of symmetry might make it 


appear more difficult than it really is. 


10. Find the largest real number T with the following property: if a,b,c, d,e 
are nonnegative real numbers such that a +b = c+ d +e, then 


Va? +b? + e2 +d +e >T(Vat Vb+ Ve+ Vd + Ve) 


37 


Proof. This is a simple application of the Cauchy-Schwarz inequality. Namely, 
we have 


2 2 2 
a2 +b? > (a + b) — ++ e> ever) _ et’) | 








so that 
Va? +b +e +d He a+b 
On the other hand, 


Va + vb < V2(a+b), Vet+Vd+Ve< V3(c+d+e) = /3(a+d), 


therefore 
(Ja+Vb+ Ve+ Vd+ Ve) < (V2 + V3)2(a + 8). 


Combining these two inequalities yields the estimate 


Vv 30 
Va? +b? + c2 + d2 + 2 > —~—__ (Va + Vb Vet Vd + Ve)’. 
6(/2 + V3)? 
To see that this is optimal, it suffices to keep track of the equalities in the 
previous inequalities. For instance, we can take a = b= 3 and c = d =e = 2. 
Thus the answer is T = —¥20— L 


6(V3+V2)? ` 


The next problem requires a whole series of applications of Cauchy- 
Schwarz. 


11. Let £1, z2..... £n be positive real numbers such that 


1 4 1 + 4 1 
l+2z, 1+22 1+2Zn 








Prove the inequality 


Vii + VR + VEn (n= 1) (Tat t Te). 


Vojtech Jarnik Competition 2002 


38 Chapter 2. Always Cauchy-Schwarz. .. 


2j 


Proof. Let ai = >= rE , so that 5° a; = 1 and x; = = SHS. Then, using Cauchy- 


Schwarz we can write: 


az + a3 +--+: + an va + yit + Van 
Dva yo [BeBe te sy vet et ve 


Rearranging terms in the previous sum gives 





1 1 1 
z T a 
and using again Cauchy-Schwarz we can bound this from below by 
l y (n — 1)*, va 
Vai ag + fag ++ + Jan 
Using once more Cauchy-Schwarz for the denominators of each fraction 
Vaz + Vaz +--+ + Van < y(n — 1)(a2 + a3 +--+ + Gn) 
yields the desired inequality. E 


The idea of the following example is worth keeping in mind, since it turns 
out to be useful in a wide range of problems. 


12. For n > 2 let aj, a2,...an be positive real numbers such that 
1 1 1 1\? 
(ai +ag+-+-+@n)(—+—+:-:-+—]< nt). 
a) ag An 2 
Prove that max(a1,a2,...,an) < 4min(a1,a2,..., an). 


Titu Andreescu, USAMO 2009 


Proof. The idea is to fix two of the variables, say a1, az and apply the Cauchy- 
Schwarz inequality to get rid of the remaining variables. Explicitly, this can 
be written in the form 


1\? 1 1 1 
n+ 5 > (a) + aq +--+ +an) gtz tot 
1 


a2 Qn 


2 
1 1 
> ( (a; + a2) (< +=) +n-2| ) 


39 


where the first inequality is simply the hypothesis, while the second one is 
Cauchy-Schwarz applied to n — 1 terms, grouping the terms indexed 1 and 


(aita2)? < 25 


2 in each multiplicand. We deduce that mas = 7, Which immediately 


implies that 
max(aı, a2) < 4min(aj, a2). 


Since everything is symmetric in aj, a9,...,@n. the result follows. O 
A bit more difficult, but with a similar flavor is the following problem. 


13. Let n > 2 and let £1, £2,...,£n be positive real numbers such that 


1 1l 1 > 
(£1 + T2 +: + Tn) wtp toto pan tl. 
1 


T2 Ln 
Prove that 
2 2 2 1 1 1 2 
aitat ta) (atat +a)>n tit Ga) 


Gabriel Dospinescu 


Proof. The crucial idea is to write the hypothesis in a different way: expanding 
the product and rearranging terms shows that the hypothesis can also be 


written 
LT; L: 2 
© (/2-V2) = 
l<i<j<n Tj Ti 
2 
| Ti | £j 
tij 7 ( Tj z) 
Tj Ti 


Now, expanding again gives us 


k “1 t;, 2; \" 
’ a; ; ) | J —- n? = ) (2-2) 
i=1 i=] 72 l<i<j<n “I t 


2 
Ti Ti 
) tij ( Sey =| 
Tj Ti 


l<i<j<n 


Let us write 


40 Chapter 2. Always Cauchy-Schwarz. .. 


». / . 2 
(J= 2) = 4+ tij. 
Tj Ti 


the inequality to be proved becomes 


Since 


2 
4 + ti; )ti; > 4+ ————, 
te Feng t n(n — 1) 


which is equivalent (taking into account the hypothesis that the sum of all t; j 


is 1) to 
5 t. > 2 
I n(n — 1) 


l<i<cj<n 


This follows from Cauchy-Schwarz inequality, but there is still a detail to be 
explained: we have to show that we cannot have equality. But if we had 
equality, all t;; would be equal to wnat) ald so all numbers F + 2 would be 
equal. Since n > 2, this would force an equality xj = x, for some j Æ k. But 


then tjk = 0, a contradiction. E 


We continue with an inequality which combines a rather direct application 
of the Cauchy-Schwarz inequality with a nice telescopic identity. We also 
present a beautiful alternate solution, due to Richard Stong. 





14. Prove that for any real numbers £1. 82,.... £n the following inequality 
holds: 
Uy T2 Uy 
+ 3, +--+ — 4" —_, < 
l+?  l+a?+x3 Tyee +r? vn. 


Bogdan Enescu. IMO Shortlist 2001 


Proof. The solution is very short, but far from obvious. The idea is to use the 
Cauchy-Schwarz inequality to reduce the problemn to 


2 2 p2 
xi L5 vy 


(1 + x7)? (1 + r? + 13)? (1 + r? +- + x2)? 


Al 


Thus, we need to prove that for any aj,d9,...,dn > 0 we have 


n 


a (1+a teta) 


Define S; = aj +a9+-:-+a;, with the convention Sp = 0. This is an increasing 
sequence and so 


(Li Si — Sj-1 
—_ SS — me e—- = OK 
Gea ta? ars * 


Si — Si] 1 1 1 
— M- —— — ——_ |] =1- 1 O 
Ge SLES) Dlo =z) 1+ Sn < 


Proof. First note that for any nonnegative real numbers c and A we have 


x n C < r+ yc- vlt A? 
14+ A2422 V\V1+AÆ+2T. (1 + A2)(1 4+ A2 + 2?) 


Vl+c-V14+ A242? 
~ \/(14 A?)(1 + A? + 2?) 
= fi ite 
-VIFA 
Here the first inequality is just 1 + A? + xz? > 1+ A? with equality if and 
only if « = 0; the second is Cauchy-Schwarz with equality if and only if 
ryc = v1 + A?. Thus we cannot have equality in both cases and the final 


inequality is actually strict. Applying this inequality repeatedly, it is casy to 
see by downwards induction on k that 








LY Tn 


1+? L+a74---+22 





< + Tk + 
1+ 2? Ll+a?+---+23 





The case k = 0 is the desired result. O 


42 Chapter 2. Always Cauchy-Schwarz. .. 


The trick of making the denominators equal thanks to a smart application 
of Cauchy-Schwarz or AM-GM inequality is also used in the following problem. 


15. Prove that if a,b,c,d are positive real numbers, then 


a b c d 4 


+ > ——. 
Pet Pte +P te a244 atbterd 
P.K. Hung 
Proof. Using the AM-GM inequality in the form (x + y)? > 4zry, we obtain 


a < 4a? 
b? +c? +d? (a? +b +e +d?) 


Adding these inequalities yields 


a Va’ 
>, z 2a pa" 
b+c+d (Y a?) 





By Cauchy-Schwarz, 


Sa} > wey 


Inserting this in the previous inequality a the desired result (note that 
the inequality is strict, since otherwise the equality in the Cauchy-Schwarz 
inequality would yield a = b = c= d, for which the inequality is strict). Note 
that even though the inequalities we used were very rough, the constant 4 is 
optimal: simply take a = b and c,d close to 0. E 


One needs rather good gymnastics with Cauchy-Schwarz to deal with the 
following problem. 


16. Let n > 2 be an integer and let £1, £2,..., £n be real numbers satisfying 
L? +Z +- +L + ir + T2r3 +e + EnEn = İ. 


For a fixed 1 < k < n, find the maximum value that |x| can take. 


China 1998 


43 


Proof. Let us write the condition in the form 
L? + (£1 +22)? + (£2 + £3) +- + (Ln-1 + tn)? + 2% = 2. 
If we fix 1 < k < n and apply the Cauchy-Schwarz inequality twice, we obtain 
£? + (a1 +22)? +++) + (k-11 + 2K)? 


> (xı — (£1 + z2) + (£2 + 43) — + (—1)F-l (£1 + zrk))? _ T% 
7- k k 


and 


(te + regi)? + + (tn-1 + Tn) +22 


> (2p + Tk+1) — (Tk+1 + Tkt) H En) — Ly 
=- n—-k+1 n—-k+1. 


Taking into account the hypothesis and these two inequalities, we deduce 


that 2 > Raith for all k, so that 


2k(n+1-— k) 
n+1 l 


By studying the equality case in the previous inequalities, one immediately 
sees that this value is attainable for each fixed k. Specifically, if we define 


ak = 4/ 2elnti =H and take zo = £n+1 = 0 by convention, then equality occurs 


if (-1)*-4z,; interpolates linearly and evenly between these endpoints and 
£k = ay. The explicit formula is 


[£k] < 


n (—1)*- Js j<k - 
| — ;_k(n+1—7)a;, . 
, (—1) Oe J 2 k 


Another ingenious application of the Cauchy-Schwarz inequality can be 
found in the following problem. 


17. Let a,b,c be positive real numbers. Prove that 
1 i 1, 1 s7f1,1,1, 1 2 
a? b @ (gt b4+c)? 25\a b c atbte 
Iran 2010 


44 Chapter 2. Always Cauchy-Schwarz. .. 


Proof. We see that we have equality when a = b = c, so we have to apply 
the Cauchy-Schwarz inequality in a smart way for the left-hand side of the 
inequality. Namely, start with 


l+1414- bop aH l 
9J a b @ (a+b+c) 
(1,1,1, 1 ? 
Tla b c 3(atbt+ec)/) ` 
This reduces the problem to showing that 


tł tł} ] , id Pity 1 
a b c 3(a+tb+c)~ 15 \a b c atb4+c)° 


Fortunately, this is equivalent to the classical inequality 


1 1 1 9 
- +-+- > —— 
a b ec atbt+e 
and we are done. oO 


The form of the following problem strongly suggests using Cauchy- 
Schwarz. However, it is rather easy to check that many attempts fail... 


18. Let x,y,z be real numbers and let A, B,C be the angles of a triangle. 
Prove that 


rsin A + ysin B + zsinC < /(1+27)(1 +y?) (1+ 22). 
Proof. Using that C = m — A — B, we obtain the identity 
xsin Á + ysin B + zsinC = (x + z cos B) sin Á + zsin B cos A + ysin B. 
Using Cauchy-Schwarz and the fact that sin? A + cos? A = 1, we obtain 


(2 + zcos B)sin A + zsin B cos A < 4/ (x + z cos B)? + 22 sin? B 


= yV T2 + 224+ 2zz cos B. 


45 


Another application of Cauchy-Schwarz thus yields 


rsin Á + ysin B + zsinC < \/(14+ y*)(x2 + 22 + 2zz cos B + sin? B). 
Thus, it remains to prove that 
T? + 2? + 2zxz cos B + 1 — cos? B <1 +77 +42 + 2727, 
which is equivalent to (xz — cos B)? > 0. C 


We now enter the zone of challenging problems, with an unusual one. 
The first solution seems to come from nowhere (well, it actually came from 
the author’s imagination. ..), but we also present a beautiful geometric proof 
of Richard Stong, which makes things rather clear. 


19. Let a, b,c,x,y,z be real numbers and let 
A=ar+by+cz, B=aytbz+cr, C =az+ bz + cy. 


Assuming that min(|A — B|, |B — C|, |C — A|) > 1, find the smallest 
possible value of (a? + b? + c?) (x? + y? +z”). 
Adrian Zahariuc, Mathematical Reflections 
Proof. Note that by Lagrange’s identity and the Cauchy-Schwarz inequality 
we can write 
(a? + b? +e)? + y? + 27) 
= (ax + by + cz)” + (ay — bx)? + (bz — cy)? + (cx — az) 
(ay — bx + bz — cy + cx — az)? 
3 


2 


> Å? + 


IB-C/ 
o 


Since everything is symmetric, we obtain 


— Ary 


(a? +b? + e(r? +y? + 27) 


B- C|? — A|? A — B}? 
> max (4? + ZO p24 CTA c24 OEE). 


46 Chapter 2. Always Cauchy-Schwarz. .. 


Using the hypothesis, it is immediate to check that the last quantity is at 
least a 

To see that the answer of the problem is $, it remains to find a 6-tuple 
(a,b,c, x,y,z) satisfying the conditions of the problem and for which 


4 
(a? +b? +e)? +y +z?) = 3° 
Taking A = 1, B = 0,C = —1 (this is a triple which minimizes 


B-C/? C— Al? A — B}? 
max (42 + BCP, ge, ICAP co , [A= BP) 


under the restrictions of the problem), we must ensure that we have equality 
in the Cauchy-Schwarz inequality. Solving the corresponding system yields, 
after some tedious but easy work, suitable values for a, b,c, x,y,z, namely 


b=0, c=--, r®=y=l1, z=-2. O 


Proof. Let u be the column vector with entries (a,b,c), v the column vector 
with entries (x,y,z) and let R be the linear map 


y 


0 1 0 
R =|0 0 1 
100 


Nx e 8 

Nx e 8 
| 
N 


T 


Then A = ulv, B =u! Rv, C = uf R?v and we want to minimize ||ul|? - ||v||?. 
Note that R acts on R by fixing the line z = y = z and rotating 120° in the 
orthogonal plane z + y + z = 0. 

Write u = u + ug, where u; lies on the line z = y = z and wz lies in the 
plane x + y + z = 0 and similarly for v. Then 


lull? = luill? + luz||? and u? REv = uf v + u? Rv. 


Thus the contributions of u; and vı to A, B,C are all the same (hence cancel 
in |A — B|, etc.). Thus clearly the minimum occurs when u; = vı = 0. In 
this case, R acts as a 120° rotation on v, hence v + Rv + R2v = 0 and thus 


47 


A+B+C =0. Recall that cosa +cos(@ + 27/3) +cos(a + 47/3) = 0 for any 
a. If we now take w to be the angle between u and v, then we compute 


A? + B? + C? = lull? - |lu||?(cos? w + cos*(w + 22/3) + cos? (w + 47/3)) 
1 
= 5 llull” - |[v||?(3 + cos 2w + cos 2(w + 27/3) + cos 2(w + 4r /3)) 
3 
= Sul? ol? 


The conditions A+ B + C = 0 and min(|A — B|, |B - C|, |C — A|) > 1 
imply that A? + B? +C? > 2 (without loss of generality, assume that A and B 
are nonnegative. As |A — B| > 1, we have max(A, B) > 1, so A? + B? + C? = 
2A? + 2B? + 2AB > 2max(A, B}? > 2) and so 


w| A 


lull? - lvl]? = (a? +0? + ch )(z? +y’ +z’) > 


It is easy to see from the proof above that equality is attained. Simply choose 
any (x,y,z) with x +y +z = 0 and choose (a,b,c) with a +b +c = 0, 
orthogonal to (x,y,z) so that A = 0, and scaled so that B = 1. For example, 
taking (x,y,z) = (1, —2,1) we see that (a,b,c) = (—1,0,1)/3 suffices. o 


We present three short solutions for the following problem. However, none 
of them is really easy or natural. 


20. Let a,b,c,d be real numbers such that 
(a? +1)(b? +.1)(c? +1)(d? +1) = 16. 
Prove that 


—3 < ab + bc + cd + da + ac + bd — abcd < 5. 


Titu Andreescu, Gabriel Dospinescu 


Proof. Write the inequality in the form 


(ab + bc + cd + da + ac + bd — abcd — 1)? < 16. 


48 Chapter 2. Always Cauchy-Schwarz. .. 


Next, apply Cauchy-Schwarz in the form 


(a(b +c+d — bed) + bc + bd + cd- 1) 
< (a? +1) [(b+ c+ d- bcd)? + (be + bd + cd — 1)°] . 


The miracle is that we actually have 
(b+c+d-— bcd)? + (be + bd + cd — 1)? = (b? +1)(c? + 1)(d? +1). 


Of course, this can be checked by brute force, but perhaps nicer is determining 
it through two applications of Lagrange’s identity. O 


Proof. We will use Lagrange’s identity and Cauchy-Schwarz in the form 
J [0 +1) = [(a +0)? + (1 - ab)?] [(c + d)? +- ca)’] 
> [(a i” (c+ d) — (1 — ab)(1 — cd)]’. 
Note that 
(a+b)(c+d) —(1-ab)(1 — cd) = —1 + ab + bc + cd + da + ac + bd — abcd. 


A moment of thought shows that what we have just proved is exactly the 
desired inequality. O 


Proof. Here is a rather unnatural, but very elegant proof. Consider the poly- 
nomial 
P(X) = (X —a)(X —b)(X —0)(X - d) 


and observe that the hypothesis becomes P(i)P(—i) = 16. This can also be 
written as |P(i)|? = 16. On the other hand, we have 


P(i) = 1- X ab + abed +i (X a- abc). 


We deduce that 


(1— Sab + abcd)? +(Soa- Yate)’ = 16, 


49 


from where the conclusion follows, as the desired inequality can be written as 
|1 — X ab + abcd] < 4. 


Straight from the Book, isn’t it? O 


We continue with a really nice, but rather technical problem. It re- 
quires some delicate algebraic computations and a rather exotic application 
of Cauchy-Schwarz. We also present a more conceptual proof, due to Richard 
Stong, which uses more advanced tools, but proves much more. 


21. Let a,b,c,d, e be nonnegative real numbers such that a? +b? +c? = d? +e? 
and af + bt + c4 = df + et. Prove that a? + b3 + 3 < dè +e. 


IMC 2006 


Proof. Since we only deal with even exponents in the hypothesis, let us square 
the desired inequality and write it in the form 


Sa 4 25 a°b° < dÊ + ê + 2e. 


Next, the identity 


Sagte = 3 (S70?) (soa! 7 (£e) 
combined with the hypothesis easily yield 
aÊ +b? + cê = 3a7b2c7 + d® + eê. 
Thus, we need to prove that 
2 ` ab? + 3a7b2c? < 2(de)? = 2(a?b? + bc? + ea)’, 


With the obvious substitutions, this is equivalent to the inequality 


2X 2° + 3ryz <2 (v2). 


50 Chapter 2. Always Cauchy-Schwarz. .. 


Using Cauchy-Schwarz 


(2) r + 3ryz)? = © r(2x° + yz)) < D z?) (Der + yz)?) , 


it remains to prove that 


N (22? +yz) <4 D 2) , 


which follows immediately from x7y? + y?z? + 27a? > zyz(x +y + z), itself 
equivalent to Y` (xy — xz)? > 0. O 


Proof. Consider the polynomial p(t) = t? — ot? + ogt — 03 with cı and o2 
fixed and o3 allowed to vary. Regard the roots x,y,z of p(t) as functions of o3. 
Suppose f : R > R is any smooth function (at least three times differentiable 
for the discussion below). Define a function 


g(03) = f(z) + Fly) + Fz). 
Then g is a differentiable function of o3 and 


/ / / 
/ f (x) f (y) f (2) 1 Mt 
g'(o3) = 2 + 4 + SS eer), 
(9) = Gye- u aua  @-a)e-y 27 O 
for some Ç with min(z,y,z) < Ç < max(z,y,z). To prove the first equality 
one merely checks that the curve 





os) = (24+ yt ee i) 

(x — y)(x — 2) (y — z)(y — 2) (z — z)(z — y) 
preserves g1, has ace > 0, and has as a 1. For the second equality 
note that E g 

f(t) - f(z)(t—yt—2)  fy(t-—z)(t-2) _ f'(z\(t—2x)(t—y) 


(x — y)(x — z) (y—z)(y—2) (z — x)(z — y) 


vanishes at t = x,y,z. Hence by two applications of Rolle’s Theorem, its 
second derivative vanishes at some Ç in (min(z, y, z), max(z, y, z)) and this is 
the required ¢. 


ol 


Apply this to z = a”, y = b?, z = c’, and f(u) = u®/2_ This amounts to 
choosing x,y,z to be the roots of p(t) = t — (d? + e?)t? + d?e?t — 03. Since 
f" (Ç) < 0 for all Ç > 0, we see that g is a decreasing function of o3 for 03 > 0. 
Thus g(0) > g(o3) which, unwinding definitions, is the desired inequality. 

This argument shows that the same inequality holds for any exponent 
strictly between 2 and 4 and gives similar (sometimes reversed) inequalities 
for other exponents. O 


The reader will probably appreciate the beauty of the following inequality. 
It is more difficult than it appears at first sight, as the obvious application of 
Cauchy-Schwarz fails rather badly. 


22. Prove that for any real numbers z1, £2,...,£n the following inequality 
holds 


9 n n 


2 
n n n2 — 
SOX lti- zj] < D S Y lri- zl? 


t=1 j=1 i=1 j=l 
IMO 2003 


Proof. The first step is to order the z;’s, say x] < £2 <--:<a2,. Then 


S> Jar z; = 25 (2; : — £i) = 2(-(n — 1)a, — (n — 3)zg +- + (n — 1)zn). 


1,j=1 i<j 
Thus, applying Cauchy-Schwarz yields 
2 
n 


N |z- zjl sa a-zi 
i,j 


i=1 





It is easy to compute the last expression and the final estimate that we 
obtain is 


2 
4n(n? — 1) $ 
Soles aj) < T ae 


52 Chapter 2. Always Cauchy-Schwarz. .. 


On the other hand, Legendre’s identity shows that 


n n n 2 
Y (m-aj) = 2n -a2 ($=) 
i=l i=1 


i j=l 


Well, unfortunately, when we combine all this we see that we are not 
done, because of the bad term 2(}>;_, ri)”. Fortunately, it is easy to repair 
the argument: indeed, we can always add the same number to all z; 'swithout 
changing the hypothesis or the conclusion of the problem. Thus, we may 
assume that 7} + zə +-::+2, = 0. But then the previous inequalities allow 
us to conclude the proof. o 


The following problem is a very tricky application of the Cauchy-Schwarz 
inequality. The technique used in the proof is worth remembering, since it is 
quite useful. It is also a standard tool in analysis and probability (it is actually 
likely that the following problem is inspired by probability theory). 


23. Let a ,a@9,...,@,, be positive real numbers which add up to 1. Let n; be 
the number of integers k such that 21-i > a, > 27*. Prove that 


` J3 < 4+ yloga(n). 
i>1 


L. Leindler, Miklos Schweitzer Competition 


Proof. Choose a positive integer N and split the sum in two parts: the one 
for i < N and the one for i > N. Apply the Cauchy-Schwarz inequality for 
each of them to obtain 


Eves ge (Gam ove 
i> i> i> 


On the other hand, we have 


DER ow and Sin < Son =n, 


i>N i>N i>l 





99 


the last relation being obvious by definition of the n;’s. Finally, and most 
importantly we have 


N n 


Dys et y 1| <S a=. 


i=1 2-* <a, <2)-i k=] 


Putting these inequalities together, we deduce that 


nli n 
i>1 


Taking N = logo(n), we obtain an even stronger (and strict!) inequality, in 
which 4 is replaced by 1. O 


We end the series of moderately difficult problems with a very nice-looking 
improvement of an IMO 2004 problem. 


24. Let n > 2 be an integer. Find the greatest real number k with the 
following property: if the positive real numbers z1, £2,..., £n satisfy 


1 1 1 
k > (z1 +29+---4+ 2p) —+4+—F---4+—], 
T1 T? In 


then any three of them are sides of a triangle. 
Adapted after IMO 2004 


Proof. The main point is to solve this problem for n = 3. If b,c are positive 
real numbers, let us look at the possible values of 


f(z) =(x@+b+c) (C4542) 


C 


when x > b+c. It is not difficult to check (either directly or by computing 
the derivative) that f is increasing in this domain of z, so that 





2 
fa) > Mb+e)=26+0)(F +2 +p] -24289 > 10. 


54 Chapter 2. Always Cauchy-Schwarz. .. 


We deduce that if x, b,c satisfy f(x) < 10, then z, b,c are the sides of a triangle 
(since everything is symmetric in z,b,c). Thus, for n = 3 the answer is 10. 

To reduce the general problem to the case n = 3, we use Cauchy-Schwarz, 
which allows us to obtain information about 21, 22,73 knowing that 


1 1 1 
k > (xti +29++:++2n)(—+—+---4+—)]. 
T] T? Ln 
Indeed, using Cauchy-Schwarz and the inequality 
1 1 2 
(z4 + tEn) (>te) an3), 
T4 Tn 
we obtain 


1 1 1 
Vk>n—34+ mtata) (>+) 
Tı T2 T3 


so that 


1 1 1 
(£1 +22 + 23) (— += +2] < (Vk —n+3)?. 
Tı T2 T3 


Using this and the previous case (n = 3), we deduce that if 
> 1 l 1 
(n-3+4+V10)* > (zti +zr2 +: +n) | —+—H+---+—], 
Ti T2 Ln 


then any three among the numbers 2, 72,...,2n are the sides of a triangle. 
It remains to check that this is indeed optimal. To see this, choose 


v10 v10 


T4 = T5 = = TIn = 1, tL = 12 = O t3 = OT 
Then z1, 22,23 are not the sides of a triangle and 
1 1 1 9 
(xı t+ao+---+2n)(—+—4+---+— ] =(n-34V10). 
T1 T2 Tn 


Thus, the answer is kn = (n — 3 + v10)?. O 


D0 


We give two proofs for the next difficult problem. The first is based on a 
very tricky application of Cauchy-Schwarz combined with a mixing-variables 
argument, while the second one is a pure, but very technical, mixing-variables 
argument. 


25. If a,b,c,d,e are real numbers such that a+b+c+d+e=0, then 


30 
(a* +b? +c? +d*+e7)* < (ai +b" +c +d" + e). 
Vasile Cartoaje 


Proof. First of all, we may assume that among a,b,c,d,e at least three of 
the numbers are nonnegative, say a,b,c. This follows immediately from the 
pigeonhole principle, possibly after changing the sign of all numbers. The key 
step is the following very tricky application of Cauchy-Schwarz: 


9(a* + b4 + ct+dt + et) = [9(a* + bt + ct) + 2(d* + e*)] + (7d*) + (7e*) 


5 (2,/21(9(a4 + b4 + c4) + 2(d4 + e4)) + 21(d? + e?))? 


84 + 63 + 63 
A small computation yields the equivalent form of this inequality 
30 
= (a! +b +ó +d‘ +e’) 


2 
4 4 4 4 4 
x (2 9(a4 +b +e) + 24d Iera) 


This reduces the problem to showing that 
24/9(a4 + bt + ct) + 2(d4 + et) > V21 (a? +b? + °). 
To exploit the relationship between a, b,c, d,e, we use the fact that 


aes (d+e)* (a+b+c)* 
2 > 


56 Chapter 2. Always Cauchy-Schwarz. .. 


Thus, it remains to prove that 
36(a + bf +c4) + (a +b +c)t > 21a? +b? + P)? 


for all nonnegative numbers a, b, c. 
This inequality does not seem to follow easily from well-known results, so 
we will employ the powerful technique of mixing variables to prove it. Let 


f(a,b,c) = 36(a* +b + ct) + (a+b+c)* — 21(a? +b? +c?) 
= 15(a4 + bt + ct) — 42(a7b? + bc? + ca?) + (a tbo). 


We will first show that for a = min(a, b,c) we have 








fabo > F(a 5° EN 


2° 2 


For this we compute 


b+c b+c\ | 4 4 (b+c) 
fabe) -f (a te hi ) =15(1 ke - e+e) 


— 42¢? ( $e Cio) —42 (ve — Cre) 
9 16 








_ “(6 — c)? [42 + 92be + 42c? — 56a?] > 0, 


where the final inequality follows since 28b? + 28c? — 56a? > 0. Thus we reduce 
to the case where a < b = c. In this case we compute 


f(a, b,b) = 4(b* + 8b° — 15b°a? + 2ba? + 4a*) 
= 4(b — a)? (b? + 10ab + 4a?) 
> 0. 


The result follows. 
Note that we have equality fora = b = c = 2 and d = e = —3. O 


ör 


Proof. We will use a mixing variables argument. Let 

F(a,b,c,d,e) = 30(a* + bt + ct + d* + et) — 7(a? +0? +c? Hd Hey. 
We want to prove that for a+b+c+d +e = 0 we have F(a,b,c,d,e) > 0. 
The basic formula we need is that 


d+e d+e 
2° 2 
= (d — e)? (21d? + 34de + 21e? — 7(a? + b? + c?)), 





F(a, b,c, d,e) — F (a b,c, 


which can be checked by tedious computation. 
By the pigeonhole principle three of a, b,c,d,e must have the same sign 


(counting zero as having either sign). By symmetry, we may assume a, b,c > 0. 
Then 


T(a? +b? +c?) < 7(a+b+c)* =7(d+e) 
< 17(d +e)? + 4(d? + e?) = 21d? + 34de + 21e?. 


Therefore by the formula above F(a, b,c,d,e) > F (a,b,c, dte, dte), Thus we 
may assume three of a, b,c, d, e are nonnegative and the other two are equal. To 
keep the same basic formula, we invoke symmetry and switch our assumption 


toc,d,e > 0 and a = b = —(c + d + e)/2. Now suppose c < e. Then we have 


7 
7(a* +b? +c?) = ~(c+d+e)* +T? 


ND 


7 
< 54+ 2e)? + 7e? = zt + l4de + 21e? 
< 21d? + 34de + 21e?. 


Therefore we again have F’(a,b,c,d,e) > F (a, b,c, dte, dte), We conclude 
that we can repeatedly average the largest of c,d,e with either of the other 
two and we will only lower F. By continuity of F, we therefore reduce to 
the case where a = b and c = d = e. In this case it is easy to check that 


F(a,b,c,d,e) = 0 and we are done. O 


58 Chapter 2. Always Cauchy-Schwarz. .. 


We end this chapter with a challenging inequality, which combines some 
clever uses of Cauchy-Schwarz with a tricky homogeneity argument. This 
result also appears in [35] and it is a generalization of a problem discussed in 
[3], chapter 2, example 11. 


26. Prove that for any positive real numbers aj, @2,...,@n, 21, T2,..., Zn 
such that 
n 
2 az= a) 
wi<j<n 


the following inequality holds 


-L (ay e+ tay) tee + al (xy +++++2p-1) > 
—— ŘĖÁ L eee L eee ———— aaaaaaaaaaaaamamamamamamamaasamamŘħe L owe L — n. 
as +- + an 2 n a) +e Fan] 1 n-l) = 
Vasile Cartoaje, Gabriel Dospinescu 
Proof. First of all, it is enough to prove that for any 21,2%9,...,2%n > 0 and 
any @),@2,...,@n > 0 we have 


y (z2 +- +n) >n isiin TiTi —. 
az +--+ an (") 


Since this is homogeneous, it is enough to prove it when z1 +.29+---+2, = 1. 
In this case, it is equivalent to 


an — n l 


az +--+ an az +-+ an (") 


But Cauchy-Schwarz shows that 


2 
ai ai 
— T] < ~ . 
Deae 1 > Dli) 





59 


and a second application of Cauchy-Schwarz yields 


2 n 
a1 2 J 2 1<i<j<n Tilj 
Dli) ON (5) 


2 
n a1 
< + —> j. 
srat E (ar Ta) 
So, it remains to prove that 
al 2 n al 2 
— m } > + — ]_. 
(Eni) ~ n— l1 Eli) 


A very elegant approach for this inequality was proposed by Darij Grinberg 
in [35], where a more general result is proved. We may assume that 














Q, +a2 +: +an=l. 
We need to prove that 
i<j 
Using T?’s lemma (a form of Cauchy-Schwarz), we reduce this to proving that 


2 


n 
S aia; > mn 2 2a — a;)(1 — aj). 


1<] 
Note that 


> aij = Lun ai 5a a(l 


i< 


Thus, if we let b; = a;(1 — a;i), it remains to prove that 


)? 
(by +b2 +--+ +bp > baby 


60 Chapter 2. Always Cauchy-Schwarz... 


This follows directly from Cauchy-Schwarz and the identity 


2S bibj = (bi +b2 +--+ + bn)? — (07 +05 +--+ +02). 
i<j 


Remark 2.1. We can also prove the inequality 
y aia; > n 
i<j (1 —a;)(1 — aj) 2n— 2 


by mixing variables: consider the map 


Oo ajoo 
F(a,,a9,...,a@ n) = 25° — (1 — a)l \(1 —a,) 
i<j J 

and set r = “4%, We claim that F(a),@2,...,@n) > F(r,2,a3,...,@n). 


A small computation shows that this is equivalent to 


a1 a2 T Qi 
+ — 2 -2 
(= l-a =) Dier 


2a1a9 2r? 59 
(l—a,)(l-a@g) (1-2)? ~~ 














Another computation shows that 


ay a2 2x (a, — a2)? 





la lo l—-x 2(1-— a)(l-— az)(1-— zx) 








and 
2r? 2a)a _ (a, — a2)? 1 — 2r 
(1— 7x)? (l-—a)(l-— a) 2 (1 — x)? (1 — ai )\(1 — ag) 
Thus, the inequality F (a1, a2,...,an) > F(£z,£,4a3,...,a@n) is equivalent to 
a; 1 — 2r 
2 
l — ai l-r 


2.1. Notes 61 


But this is easy, since the left-hand side is at least 2)/;.3a; = 2(1 — 2x) 
and 2(1 — 2r) > 4. is equivalent to (1 — 2z)? > 0. Thus, we have 


l-r 





F(a1,a2,...,an) > F(xr,2,a3,...,@,). Continuing to mix variables in this 
way and using the continuity of F implies that F'(a),a2,...,an) is at least 
F(m,m,...,m), where m is the arithmetic mean of the a;’s, namely 1/n. The 


result follows. 


2.1 Notes 


The following people provided solutions to the problems discussed in 
this chapter: Vo Quoc Ba Can (problem 20), Ta Minh Hoang (problem 17), 
Mitchell Lee (problem 6), Dung Tran Nam (problem 25), Dusan Sobot (prob- 
lems 1, 2, 5, 7), Richard Stong (problems 14, 19, 21, 25), Gjergji Zaimi (prob- 
lems 3, 4, 10, 13). 


Addendum 2.A  Cauchy-Schwarz in 
Number Theory 


This addendum shows some very beautiful applications of the Cauchy- 
Schwarz inequality to number theory problems. We present Gallagher’s sieve 
and a beautiful arithmetic application; we discuss the large sieve, an amaz- 
ing tool invented by Linnik and extensively developed by a series of brilliant 
mathematicians; finally we discuss another famous result, the Turan-Kubilius 
inequality. The Cauchy-Schwarz inequality plays an important role in the 
proofs of all these theorems, which are elementary but quite powerful: this 
ought to show the reader how Cauchy-Schwarz appears in “real mathematics” 
and not only in olympiad-type problems. Analytic number theory has the rep- 
utation of being rather technical and this addendum is no exception. We hope 
that the results presented here (especially their applications) will compensate 
for this nonetheless. 

The following two sections try to give satisfactory answers to the following 
natural problem: suppose that A is a set of integers such that 


A (mod p)={xz (mod p)|z € A} 


is relatively small for all primes p in a finite set P. Are there nontrivial bounds 
on the size of A? 


2.A.1 Gallagher’s sieve and a nice application 


Recall that the von Mangoldt function A is defined by A(p”) = logp if 
p is a prime and n > 1 and A(z) = 0 for any other integer x. The crucial 
property of A is that 


` A(d) = logn 


d|n 


for all n. The following theorem is a pretty tricky application of the Cauchy- 
Schwarz inequality, but the application given below reveals its usefulness and 
power. 


2.A. Cauchy-Schwarz in Number Theory 63 


Theorem 2.A.1. (Gallagher’s larger sieve) Let S be a finite nonempty set of 
integers and let P be a finite set of prime powers. Assume that for each p € P 
we can find a real number u(p) > {s (mod p)|s € S}| such that 


A(p) 
—<~ > 2log X, 
2 u(p) 


where X = maxses |s|. Then 
2 peP A(p) — log 2X 


A(p) l 


Proof. Let p € P and let s(r,p) be the number of elements of S that are 
congruent to r modulo p. Then by Cauchy-Schwarz and the fact that 


u(p) > |{s (mod p)|s € S}| 


IS] < 


we have 


p-1 2 p-1 
S|? = (> sn] < u(p) X` s(r,p)’, 
r=0 


r=0 


thus 
—] 


s2 À 
WA Bst Di 
$1,82ES $1 482ES 
s1}=s2=r (mod p) p|s1—Ss2 


Multiplying this by A(p) and summing over all p yields 


is? So Ty ISI A® p+ >> So Ap) 


pEP pEP 81482 p|s1— s2 
AS 
` A(p) < log(|s1 — s2|) < log 2X, 
p|s1ı-s2 
we deduce that 
is > ŽP? < |51 5 Alp) + USP — ISI) log 2x, 
pEP ut) pEP 


from which the result follows immediately. O 


64 Chapter 2. Always Cauchy-Schwarz. .. 


The promised application (taken from [19]) requires some preliminaries. 
The following result is very classical, being related to the following natural 
question: given two positive integers, what is the probability that they are 
relatively prime? 

Proposition 2.A.2. As x > oo, we have 


3 2 
X ylk) =-3T +O(zln z). 
k<z 
Proof. The key point is the equality 
Plk) _ x> pld) 
ko >, d ’ 
d|k 


which follows easily from the classical formula 


k<zr k<x | d|k d 
-5d Sag 
=> d DE 

z ; 


d<x 2 
r? a(d) x 
= 5 ge tO 
d<x d<r 


Note that > ogc, 3 = O(zlnz). Next, we claim that 
(d) _ 6 


a n? 
d>1 


2.A. Cauchy-Schwarz in Number Theory 65 


This follows from Euler’s celebrated formula ) 7,5, oy = zt and the following 











computation! 
u(d u(d) 

T Dh- Eh Ea 

d>1 n>1 nda | k>1 d|k 
Finally, as 

1 l 
——| < — =O|- ], 
Damh) 
d>x d>x 

we obtain the desired result by combining the previous observations. E 


Theorem 2.A.3. Let a,b > 1 be integers such that for any prime power p 
there exists k > 1 (depending on p) such that b = af (mod p). Then b is an 
integral power of a. 


Proof. We may assume that a > 3 (as we may work with a? and b? instead of 
a and b). The most difficult step is to establish that Ina and lnb are linearly 
dependent over Q. Let us assume that this is not true and consider a large 
number x. Let S, be the set of numbers smaller than z and of the form a’ - b, 
with i,j nonnegative integers. As Ina and lnb are linearly independent over 
Q, the set Sz has the same number of elements as the number of pairs (2, 7) for 
which ilna + jlnb < Inz. So, there is an absolute constant c > 0 depending 
only on a and b such that |S;| > c(Inz)?. We will bound |.S;| from above 
using Gallagher’s sieve. 

For any positive integer y let Py be the set of prime powers dividing at 
least one of the numbers a — 1,a?—1,...,a¥ — 1. For each p € Py, let u(p) be 
the order of a mod p. Then u(p) < y for all p € Py, and, since b is a power of 
a modulo p, we have |.S,(mod p)| < u(p) for all p € P}. To continue, we need 
a technical lemma. 


Which uses the absolute convergence of the double series, as well as the fact that 
alk u(d) equals 0 for all k > 1 and 1 for k = 1. 


66 Chapter 2. Always Cauchy-Schwarz. .. 


Lemma 2.A.4. There exist constants c,,c2 > 0, depending only on a > 3, 
such that for ally > 1 we have 


ay? < X Alp) < cay’. 
pEPy 


Proof. One estimate is very easy, since 


A < = A(d) = ~] 3 y(y + 1) 
S> Awe) <> SS 0) = Dine? = 1) < Ina 


The other estimate is more delicate and crucially uses properties of cyclotomic 
polynomials? and proposition 2.A.2. Let n be the nth cyclotomic polynomial. 
Remark 9.5 implies that 


` A(p) > `S A(p )- Š A(p) = In dg(a) — Ind. 


u(p)=d P\loa(a) p|d 


By the very definition of dg we have ln ġala) > (d) - In(a — 1). Hence 
S A(p) = >> ` A(p) > | > (d) -In(a — 1) - X lnd. 
pE Py d<y u(p)=d d<y d<y 


Since ` Ind = O(y lny), it suffices to use proposition 2.A.2 to conclude. O 
d<y 


The previous lemma shows that by choosing c correctly and y about cln z, 


we can ensure that 
> A (p) 
ulp) = 


pE Py 


) > 2In(2z) 


eim 
SM 


and so by Gallagher’s sieve 


A(p) -1 < 
Sel < ED an) 2M n(2x) | <cglnz 


2For more details the reader is referred to section 9.2. 


2.A. Cauchy-Schwarz in Number Theory 67 


for an absolute constant c3. This contradicts the first paragraph for large zx. 
Hence Ina and lnb are linearly dependent over Q and so there exists an 
integer c > 1 and positive integers i,j, relatively prime, such that a = c’ and 
b= cœ. If p is a prime power divisor of c’ — 1, there is kp such that p divides 
cike — 1 and so p divides cf — 1. We deduce that c' — 1 divides cf — 1 and so 
1 divides 7. The result follows. O 


Remark 2.A.5. The result does not hold if we consider only primes instead of 
prime powers. For instance, using properties of quadratic residues (not more 
than the multiplicativity of Legendre’s symbol) one can easily prove that 16 
is an 8th power modulo any prime. Of course, 16 is not an eighth power of 
an integer. In the beautiful papers [4] and [32],° the following general result 
is proved fully using techniques of algebraic number theory: 


Theorem 2.A.6. Let n > 1 be an integer and let a be an integer such that 
a is an nth power modulo any sufficiently larget prime. Then either a is the 
nth power of an integer or 8|n and a = 22b” for some integer b. 


2.A.2 The large sieve 


This rather long section presents a very deep result in analytic number 
theory, known as the large sieve. Introduced by Y. Linnik and extensively 
developed by Renyi, Bombieri, Davenport, Montgomery, Vaughan, Gallagher 
(see [8], [9], [14], [25], [37], [44], [46], [57], [55], [56] to cite only a few refer- 
ences), this became a basic tool in modern analytic number theory, with rather 
spectacular results. We start by presenting a vast generalization of a famous 
inequality of Hilbert, due to Montgomery and Vaughan, which, combined with 
some very clever tricks, yields the analytic form of the large sieve inequality. 
Combined with standard results in finite Fourier analysis (for which the reader 
is invited to read addendum 7.A) this yields arithmetic forms of the large sieve. 
This has some amazing applications to the distribution of prime numbers, twin 


3Also see theorem 9.B.60. 
1 Actually, the proofs show that it is enough to assume that this holds for a set of primes 
of Dirichlet density 1. 


68 Chapter 2. Always Cauchy-Schwarz... 


primes, least quadratic residue, prime numbers in arithmetic progressions, etc. 
We follow rather closely the amazingly well-written article [56]. 


The analytic form of the large sieve inequality 


We will spend some time proving the following deep theorem, known as 
the analytic form of the large sieve inequality. Let ||z|| = minnez |x — n| be 
the distance from the real number zx to the discrete set Z. 


Theorem 2.A.7. Let £1, £2,...,£n be real numbers such that 
llc; —2,||>e>0 


for alli £ j. Let 
T(x) = ` Zp otinks 


M<k<M+N 
be a trigonometric polynomial, where M,N Ee N and zyj41,.-.-,ZmMman EC. 
Then 
k 1 
2 2 
Yireyes(v+2) D la? 
j=l M<k<M+N 


Theorem 2.A.7 has a long history and many mathematicians contributed 
to it: Davenport-Halberstam, Gallagher (who gave a very simple proof of the 
inequality with 7N +ł instead of N + ), Montgomery and Vaughan, Selberg. 
One can prove (this is due to Selberg) that the factor N + 1 can be improved 
to N-—1+ 1 and that this is sharp. The proof of theorem 2.A.7 will span over 
the next sections. 


Tools from linear algebra 


In this section we recall a few standard facts about inequalities concerning 
matrix norms and we prove a duality principle. Recall that the standard her- 
mitian product on C” is defined by (x,y) = }0;_, ziji, where x; (respectively 
y;) are the coordinates of x (respectively y). If v € C”, we denote |v|? = (v, v). 


Definition 2.A.8. A matrix A = (aij) E€ M,(C) is called hermitian if for all 
x,y E€ C” we have (Az, y) = (a, Ay). 


2.A. Cauchy-Schwarz in Number Theory 69 


The reader can easily check that A = (a;;) E€ Mn(C) is hermitian if and 
only if aj; = ūji for all 7,7. A fundamental result of linear algebra is that for 
any such matrix A we can find real numbers Aj, A2,..., An and an orthonormal 
basis v1, V2,...,Un of C” (ie. (uz, v;) =O ifi #7 and 1 otherwise) such that 
Av; = à; v; for all i. This can also be stated as: all eigenvalues of a hermitian 
matrix are real numbers and there exists an orthonormal basis consisting of 
eigenvectors.” The following result will be very useful in the next sections. 


Proposition 2.A.9. Let A be a hermitian matrix and let C > 0. If the 
inequality |(Av, v)| < Clv|* holds for any eigenvector v of A, then it holds for 
any v € C”. 


Proof. Let v1, v2,..., Un be an orthonormal basis of eigenvectors, with cor- 
responding eigenvalues 1,A9,...,An- Consider any v € C” and write 
v=o, viv; for some x; E€ C. Then 


n 
(Av, v) =2 LiL; Aq (Vi, vi) =% dj Jil’. 
i=1 


By hypothesis we have Uo, vi)| < Clu;|? for all i, and since Av; = A; vi, we 
must have |A;| < C. Hence 


(Av, v) < CY lei? = Col? z 


i=1 
We end this section with a very useful technique. Though elementary, it 
will play a key role in the proof of theorem 2.A.7. 


Proposition 2.A.10. (Duality principle) Let (ai;)1<i<m<j<n be complex 
numbers and let C > 0. The following are equivalent: 


1) For all z; € C we have 








“Recall that if A = (aij) € Mn(C) is any matrix and if à € C and v € C” — {0} satisfy 
Av = à: v, then we say that v is an eigenvector of A associated to the eigenvalue A. 


70 Chapter 2. Always Cauchy-Schwarz... 


2) For all y; € C we have 


2 


m n n 
X |X agu) <CY luil. 


Proof. Assume for instance that 1) holds. Then for all z;,y; E€ C we have, by 
Cauchy-Schwarz 


2 


m n n 
Dd tizi < X lysl? KDD Sayal 
i=l j=1 j=l J 


1 


< 22 jzil* - 32 lysl’. 








By choosing for z; the complex conjugate of X- j Qig Yj, We obtain the inequality 
in 2). The converse is proved in exactly the same way. E 


Montgomery and Vaughan’s theorem 


The key technical ingredient in the proof of theorem 2.A.7 is the following 
delicate result [57]. 


Theorem 2.A.11. (Montgomery and Vaughan) Let z1, £2,...,£n be real 
numbers such that ||x; — z;|| > € > 0 for alli £ j. 
Then for all z1, Z2,...,Zņn E C 


n 


D area) meg < LS Ja. 
sin 1(z; — Tj E€ 


1=1 


The proof of theorem 2.A.11 is a very nice mixture of elementary, analytic 
and algebraic arguments. A first crucial ingredient is Euler’s famous identity 
(that we will take for granted) 


T —1)¥ 

T iim Y E. 

sinr  N>æ r+k 
Ik] <N 








2.A. Cauchy-Schwarz in Number Theory 71 


Since R 
(—1)*|k| 2a(—1)*k 
> r+k = r2 — k2 = o(N) 
Ik|<N k=1 


(the last equality is immediate, since eal tn — 0 as n — oo), we can also 


write . . ; 
T k ) 
= li 1 — 
sin mT jim, 2 ( PE 
k| <N 





Thus, it is enough to prove that for all N > n we have 


DaD (0-8) SIs Poke 


tfJ |kISN 


Now, since there are N — |k| solutions of the equation jı — j2 = k with j1, jo E 
{1,2,..., N}, it is not difficult to see that the inequality is equivalent to 


>” »d hc ath- h <- X Jal. 
11 A012 1<j1, J2 <N 12 Jı J2 
This follows from the following vast generalization of a famous inequality due 


to Hilbert, applied to the family of real numbers (z; + j)i<i<n,ı<j;<N and to 
the family of complex numbers ((—1)!zi)1<i<ni<j<n- 


Theorem 2.A.12. (Montgomery- Vaughan) Lete >0 and let £1, £2,..., £n ER 
be such that |x; — x;| > € for alli # j. Then for any complex numbers 
Z1, %2)+++9%n 


ears a3 il”. 

; Ti — Tj 

i$j 
Proof. By homogeneity and symmetry we may assume that € = 1 and that 
Ly < T2 <: < £n. Then the hypothesis implies that z; — x; > j—t fori < 7. 
Consider the matrix A, where aij = liz;- z —1_. 7 Note that A is antisymmetric, 
thus i- A is a hermitian matrix, to which proposition 2.A.9 can be applied with 


72 Chapter 2. Always Cauchy-Schwarz. .. 


C = m. Thus, we may assume that z = (z),..., Zn) is an eigenvector of iÁ, so 
also of A, say Az = i - Az for some A € R. 
By Cauchy-Schwarz, the square of the left-hand side of the desired in- 


2 
IF 0335 , so it is enough to prove the 





equality is bounded by `; |z;|* - 


inequality 
do) dw Poy 
îi |jÆi 


Expanding brutally the left-hand side and using the crucial observation that 
QijQik = Ajk(ij — Aix) yields 


LHS = D X aijai 


ix~7,k 
=J lal? d ai + Denese |) aij — 9 aik + 2ayp 
j izj pk fj ik 


It is now time to use the fact that z is an eigenvector: 


Dan | J ay | =) 5 | 2 akar] |9 a 


j#k ij j k#j ižj 
= ià X |z? X aij 
j tj 


Doing the same with the other sum, we finally obtain that the term 


D> FrKajn | Dai 


j#k iFj 
cancels with the other similar term, so that 


LHS = dll" Sal +S 2752403. 


WJ j#k 


2.A. Cauchy-Schwarz in Number Theory 73 


Now, using the AM-GM inequality |2Z;z,| < |z;|? + |z,|? and rearranging 
terms, we obtain 
LHS < 3 |z X aj,. 
j tj 
As |x; — z;| > |i — j|, we have for any fixed j that 
1 1 r? 
2 
DiE enh 
— — (i — j) n 3 
ifj tJ n>1 


Combining the last two inequalities yields the desired result. E 


Proof of theorem 2.A.7 


Let akj = e2Tkz; for 1 < j <nand M <k<M +N. We need to prove 
that for all (Zk) M<k<M+N we have 


2 


O) E wa < (+2) Dla? 


n 
j=l |M<k<M+N 


and by the duality principle (proposition 2.A.10) it is enough to prove that 
for all y1,..., Yn we have 


S > [Š asyl < (N+ 3 N lysl’. 
M<k<M+N |j=1 j 
Expanding the left-hand side, we obtain the equivalent inequality 
. 1 
— klz: —2; 
Xo ue: X enat) < = lul. 
JiFJ2 M<k<M+N j 


°We use here Euler’s famous identity 


74 Chapter 2. Always Cauchy-Schwarz. .. 


Using the formula for the sum of a geometric series, a small computation shows 
that for all u 


y o2itku _ l (eiu MNH) _ eimu(2M +) ) 


21 sin(7u) 
M<k<M+N 


By the triangle inequality, it is thus enough to prove that for any 
sE {2M+2N+4+1,2M +1} 


and for any y1,...,Yn we have 


y Yin Vin eit (£3 —Ljq)8 < Ly ly? 
ha sin 1(2j, — Lj) E 7 


NST] ; 


But this follows from theorem 2.A.11 with z; = y;-e 


A quick proof of a weaker form of the sieve inequality 


In this section we present Gallagher’s short and beautiful proof of the 
following weaker form of the large sieve inequality. It is much simpler than 
the proof presented in the previous sections and the result it establishes is 
good enough in most applications of the large sieve. 


Theorem 2.A.13. Let 71,29,...,Z2n be real numbers such that 
llc; —z,||>e>0 


for alli Æ j. Let 
T(z) — `S Zk e2inkr 


M<k<M+N 


be a trigonometric polynomial, where M,N €E N and zjsay,...,2m4n € C. 


Then 
n 1 
Drei < (aN +2) D la 
j=l 


M<k<M+N 


2.A. Cauchy-Schwarz in Number Theory 75 


Note that the only difference between this theorem and theorem 2.A.7 is 
the factor tN, which is N in theorem 2.A.7. 


Proof. Assume that f is a continuously differentiable complex-valued map on 
R. Integrating by parts, it is easy to establish the equality 


e- f(z;) -f roat f (t-zj+ =) fi(t)dt 


E . 
J 


+ in (t -— zj — =) f'(t)dt. 


J 
Using the triangle inequality and the fact that 


E 


E 
t-s+5|<5 


for t € [z — 5, zj] (and a similar inequality for t € [z;, Tj + EJ), we obtain 


1 tits 1 tits 
fens f O+; Old 


j~ 2 


JI 2 


Take f(x) = T(x)? and add the corresponding inequalities. The hypothesis 
|z: — x;|| > € ensures that the intervals (x; — £,2; + £) do not overlap mod 
1, so using the 1-periodicity of f we obtain 


Ss P(es)/? <i [ T(2x)|2de + [ T(x)| - |T'(&)ļdz. 
j=l 


Using Parseval’s equality and the Cauchy-Schwarz inequality, we obtain 








| MeN M4N M+N 
2 IT(x;)° < z’ ze? +a) XO laal So [2rkzgl? 
k=M+1 k=M+1 k=M+1 


M+N 


1 
<|-+2 k 2. 
< (+2 max pkl) So lal 


k=M +1 


76 Chapter 2. Always Cauchy-Schwarz... 


So we are done if we can ensure that maxm<k<M+n |k| < x Of course, for 
arbitrary M and N it is unreasonable to hope for such an inequality, but a 
moment of thought shows that M plays no role in the theorem we are trying 
to prove, so we can simply choose M = — [5H] to finish the proof. O 


Arithmetic forms of the large sieve inequality 


We will apply theorem 2.A.7 for a special family of well-spaced numbers 
zi: pick an integer Q > 1 and consider all numbers of the form 7 with 
gcd(a,q) = l and 1 <a < q < Q. It is clear that the difference between 
two such numbers is (in absolute value) at least OP: Applying the large sieve 


inequality to this collection, we deduce the following 


Theorem 2.A.14. Let 


T(x) — `S ape TKT 


M<k<M+N 


be a trigonometric polynomial. Then for all Q > 1 we have 


Q 2 
a 
DDD r ($) <(N+Q2) E ak 
q=1 1<a<q q M<k<M+N 
gcd(a,q)=1 


Next, let us observe that 


o, q . 
7 (2) -Eae =>) S ap ce 


k h=1 \k=h(mod q) 





Therefore, using the techniques of addendum 7.A, more precisely Plancherel’s 
formula (theorem 7.A.5, but this can also be done by expanding everything), 


we obtain ; 
(5) 
q 


ay S a 


=] |kÆh (mod q) 


3 


a=1 








2.A. Cauchy-Schwarz in Number Theory 77 


Now, if the a, were uniformly distributed, one would expect that 
S kzn (mod q) & behaves as Ey = z Dok ak. Actually, a small computation 
reveals that 
2 2 


`S `S ak— E `S Qk — q| Eql’, 


h=1 |k=h (mod q) h=1 |k=h (mod q) 


Me 


which combined with the previous formula yields 


ay |? 
e(a) 
q 
—] 2 
Unfortunately, $-4_] Ir (2) 


>, 


1<a<q,gced(a,q)=1 


2 


q 
=q% ` ak — Eg 


h=1 |k=h (mod q) 


q—l1 








a=1 


is usually larger than 


rha) 


but if q is a prime number, they are actually equal! Using these remarks and 
the previous theorem, we obtain the following strong inequality: 





2 


? 








Theorem 2.A.15. Let (ak)m<k<M+nN be a sequence of complex numbers. 
Then for all integers Q > 1 we have 


D> `S aa) < (N +Q?) 2 lanl’ 


p<Q  h=1 |k=h (mod p) 
the sum being taken over all primes p < Q. 


By specializing a, = lķea for a subset A C [M +1,M + N], we obtain 
the following equidistribution result: 


Corollary 2.A.16. Let AC [M +1,M + N] be a set of integers. Then for 
all integers Q > 1 we have 


P l 2 
E Vp [Anat p2)- cial) < (w+ 9%)1A\ 


PS<Q h=1 


78 Chapter 2. Always Cauchy-Schwarz. . . 


This finally yields a “sieve inequality.” 


Theorem 2.A.17. Let AC [M+1,M+N] be a set of integers and suppose 
that for each prime p < Q at least w(p) residue classes mod p contain no 
element of A. Then 

N + Q? 


w(p) ` 
2 p< p 


Proof. Use the previous corollary and the observation that for each p we have 


|A| < 





XC |AN (h + pZ) — “IA > w(p)—, 
h=1 P 
as at least w(p) terms in the sum are equal to a O 


The previous theorem helps understanding the name “large sieve.” In- 
deed, in typical applications A (mod p) will miss a good part of the residue 
classes modulo p (i.e. the integers w(p) will be quite large) and the sieve 
inequality will yield nontrivial bounds on |A|. The next result, an amazing 
theorem of Linnik, is at the very origin of the large sieve and one of its best 
applications. It concerns the distribution of the least quadratic non residue 
modulo p. Vinogradov conjectured that it is smaller than c(e)p* for any € > 0 
and if one assumes the Generalized Riemann Hypothesis, then one can actu- 
ally prove that it is smaller than c(log p)?. The following deep theorem due to 
Linnik [47] is a first step in the direction of Vinogradov’s conjecture (which is 
still largely open): 


Theorem 2.A.18. (Linnik) Let n(p) be the least positive integer which is not 
a quadratic residue mod p. Then for all € > 0 there is a constant c(é) such 
that for all N we have 


{p < N|n(p) > N*}| < cle). 


Proof. Fix € 7 0, which for simplicity (but without loss of generality) we take 
of the form 4 for some integer d greater than 2. For a positive integer N, let 


Py = {p < VN|n(p) > N°} 


2.A. Cauchy-Schwarz in Number Theory 79 


and 


AN = {nN (z) = 1] forall pe Py). 
Pp 
We will prove the following technical result: 


Lemma 2.A.19. There exist positive constants c(£€) and c,(e€) such that for 
all N > cy(e) we have |An| > c(e)N. 


Let us assume for a moment that this holds. Since Ay (mod p) misses all 
quadratic non-residues mod p, we can choose w(p) = pes > E in the previous 
theorem and obtain |Ay| < Pay: Hence for all N > cy(e€) we have |Pn| < ae) 
which is enough to conclude the theorem. 

Let us prove the lemma. Note that Ay contains all numbers n < N all 
of whose prime factors are smaller than N*, as Legendre’s symbol is mul- 


tiplicative. We, will consider all numbers n = kp,---pq smaller than N, 
with Pi E |[N*~ 2, NF] in nondecreasing order and some integer k. Note that 


k < N3 „asn < N, so k is relatively prime to any prime greater than N° 
This easily implies that all such numbers are different and they clearly belong 


to Ayn. Since there are more than Ee Bl — 1 possible values for k, it follows 
that we have at least 
d 
N 1 1 
D ioega] E p) nwo 
4 PL Pd d! 2 P 
piE[NET 2 NE] pE[NET 2 NE] 


such numbers. Taking into account the fact’ that m(x) = o(x) and 


ys 1_ log PE + (1) 


pela, b] P 


as a,b — oo, the last quantity is easily seen to be greater than ec(e)N for 
some positive constant c(£) and all sufficiently large (depending on £) N. The 
desired result follows. O 


TBoth these results are proved in addendum 3.A. 


80 Chapter 2. Always Cauchy-Schwarz... 


Though theorem 2.A.17 is already a very strong result, in most applica- 
tions one needs more refined estimates, in which one takes into account all 
q < Q, not only prime numbers. In order to do this, we need another nice ap- 
plication of finite Fourier analysis and Cauchy-Schwarz, but before doing that 
recall that u is Mobius’s function, defined by p(n) = (—1)* if n is a product 
of k > 0 distinct prime numbers and p(n) = 0 otherwise. 


Theorem 2.A.20. Let AC [M+1,M+N] be a set of integers and suppose 
that for each prime p < Q we have 


|{a (mod p)|a E€ A}| < p—w(p). 





Then for all trigonometric polynomials T(x) = dope, a,e2"™*® and all positive 
integers q 
> rel enor ror 
ged(aq)=1' 4 pla ” 
a<q 
Proof. Let 


)? ” TT 
pla ” 


a multiplicative function. We will prove the theorem in two steps: first we 
will prove it when q is prime and then we will show that if the theorem holds 
for two relatively prime numbers q, q’, then it also holds for qq’. First, assume 


that q is prime. Let 
Th = ` Qk. 


kEA,k=h (mod q) 


Then at least w(q) of the numbers zp are zero (by hypothesis), so by Cauchy- 
Schwarz and Plancherel’s identity we get 
() 
q 


q 
-Ea < a-a = (1-22) 


h a=1 
from where the result follows easily. Next, assume that the result holds for q 
and q’, with gcd(q,q’) = 1. Using the Chinese Remainder Theorem and the 


2 








? 








2.A. Cauchy-Schwarz in Number Theory Sl 


fact that the theorem holds for q (with T (5 + x) instead of T(x)) and then 


for q’, we have 


>, 


a€(Z/qq'Z)* 


2 
a 
ra) ed 
be(Z/q'Z)* a€(Z/qZ)* 
> X g) 
be(Z/q'Z)* 


> 9(q)9(q')|T (0)? = g(aq’)|T(0)|?. 

















O 


Combining the special case T(x) = ope, e?'™kt of the previous theorem 
with the large sieve inequality (theorem 2.A.14), we obtain the following strong 
form of theorem 2.A.17: 


Theorem 2.A.21. (Montgomery) Let A C [M+1,M+N] be a set of integers 
and suppose that for each prime p < Q we have 


{a (mod p)|a € A}| < p—w(p). 





Then |A| < N+Q" where 


b= Yaa? YO 


q<Q plq 


Some applications 


We present here two classical applications of the large sieve: a uniform 
upper bound on mr(m +n) —7(n) and an upper bound for the number of twin 
primes smaller than n. 


Theorem 2.A.22. Ifm > yn, then there are at most 
m+l1andm+n. 


oon primes between 


82 Chapter 2. Always Cauchy-Schwarz. . . 


Proof. Let A be the set of prime numbers between m + 1 and m+n. No 
element of A is divisible by some prime smaller than or equal to [yn], so by 
theorem 2.A.21 we obtain 


Gk 
|A| < ) L = — ` u(q 
q<J/n plq 





p— p- 1 
J 
Using the fact that z = DiI (3) , we obtain 


1 1 1 
L> `S = = 5 z > log vn = 5 logn. 
pepi p? p? jem? 


Inserting this result in the previous inequality yields the desired result. O 


Theorem 2.A.23. There exists an absolute constant c such that for all n 


there are at most logen)? primes p < n such that p+ 2 is also prime. 


Proof. Let f(n) be the number of twin primes smaller than n and let A be 
the set of twin primes in ([yn],n]. If p € (2,[/n]] is a prime number, then 
clearly 0, —2 ¢ {a (mod p)|a € A}, so we can take w(p) = 2 for such primes 
in theorem 2.A.21 (and w(2) = 0), deducing that 


fin) - AVADA, L= x ula)? T=. 
geda) Pk 


J 
It remains to obtain a lower bound on L. Since 53 = j>] (2) , we obtain 


iite +is . og 
L> 5 2 > 5y (41 +1): (is +1) 
, p's py e. ps 
p! . $s <yn pi. . PS <yn 
‘min pi>2 min Di>2 
2 


d(k) 1 > 
= —— > — > 
S | D i| > egn, 
k<yn k< Vn 
gcd(2,k)=1 gcd(2,k)=1 


2.A. Cauchy-Schwarz in Number Theory 83 


where d(k) is the number of divisors of k. 
Combining the previous estimates, we obtain 


f([Vvn]) < 


Toe oP 


for some constant cı > 0. Using the trivial bound f([./n]) < yn =O (a=). 


log* n 


the result follows. O 


Using the previous theorem, we can easily prove the following famous 
result, which is probably the origin of all modern sieve theories: 


Theorem 2.A.24. (V.Brun) The sum of the inverses of all twin prime num- 
bers converges. 


Proof. By the previous theorem, there are at most “> l twin primes between 


2) and 23+! , for some absolute constant c. The sum "of the inverses of these 


c2i c ] 
twin primes is smaller than ay -SF = j. As 2>] zz converges, the result 


follows. ~ O 


These are far from being the most spectacular applications of the large 
sieve (though they are already deep theorems!). We refer the reader to [9], 
[19], [37], [44], for many other applications. 


2.A.3 The Turán-Kubilius inequality 


In this section, we use again the Cauchy-Schwarz inequality to prove a 
famous inequality in analytic number theory, the Turán-Kubilius inequality. 
We then apply it to establish a well-known result of Hardy and Ramanujan 
(which was actually the origin of this inequality) and then a very amusing 
consequence due to Erdős. We also present a dual form of the Turán-Kubilius 
inequality, due to Elliott, which has a similar form to the inequality established 
in theorem 2.A.15 and which turned out to have some pretty far-reaching 
consequences (for instance, a proof of the prime number theorem!). 


84 Chapter 2. Always Cauchy-Schwarz. .. 


A function f : {1,2,...} — C is called additive if f(ab) = f(a) + f(b) 
for all relatively prime positive integers a,b. It is called strongly additive if 
moreover f(p*) = f(p) for all primes p and all positive integers k. Thus 


— `S fpr ™), respectively f(n =X f(p 


p|n p|n 


when f is additive, respectively strongly additive. In particular, if f is additive 


we have 
Dosa D (Lael Lara) 10 


pë <n p 


while if f is strongly additive, then 


ae -15 H f(p). 


Thus 





k 
E's(n) = `S (1 — z) Er ) respectively Ep’ (n) = ` F(p) 


pë <n P pxn P 


is a good approximation of the average value of f(1),..., f(n) if f is additive, 
respectively strongly additive. Our fundamental result will allow us to see 


=) tale , respectively BF (n =Ņ7 L 


pk<n p<n 


as an upper bound for the variance of f as a random variable on {1,2,...,n} 
(with the uniform distribution). Without further ado, we follow [24], which 
contains a very easy proof of this theorem. 


Theorem 2.A.25. (Turdn-Kubilius inequality) There exists an absolute con- 
stant C > 0 with the following property: 


2.A. Cauchy-Schwarz in Number Theory 85 


1) For all strongly additive maps f : {1,2,...} 4 C and all n 
- P |f (k) — EF (n)|? < C- BF (n). 
2) For all additive maps f : {1,2,...} > C and all n 


PG (n)|? < C - By(n). 


Proof. We will prove only 1), as 2) is proved in exactly the same way. Suppose 
first that f takes nonnegative values. Denote E = EF (n), B= B? (n) and 


observe that 


S= |f(k) - EP = D0 SE? -2EY f(k) + nE. 
k=1 


k<n k<n 


As f is strongly additive, we have 


DOLD OLOEDI H VESI H fo) F(a) 


k<n k<n paalk p<n pqxn PI 
P#q 


and the last quantity does not exceed nB? + nE?, as f takes nonnegative 
values and |z] < x for all x. On the other hand, since [z] > x — 1 for all z, we 
can write 


-2EY fa) = -2E |2 | £0) )<-2nE? +2E X f(p). 
k<n p<n p<n 


All in all, we obtain 


S < nB? +2E0_ f(p) 


pín 


Next, two applications of Cauchy-Schwarz show that 


E.X flp) < J l ope. Xp: B? = B?. Xop. D 
pín p<n > pan pan pan P 


86 Chapter 2. Always Cauchy-Schwarz. .. 


Using the classical estimates® 


1 n2 
a5 O(log log n), S p=0 (=). 


pan pín 


S < nB? (1+0 (eo) 
logn 


and the result follows. 
Suppose now that f is real-valued and define two strongly additive func- 
tions g1, g2 by 


we deduce that 


gip) = lfp>0: fP), g2(p) = —1fp<o: f (Pp). 


Using Cauchy-Schwarz, we get (with E = Eș(n), Ej = Eg; (n)) 


XC | f(k) - E}? <25 5 lgt) — E;|? 


k<n j=1 k=1 


and the result follows easily by applying the result of the previous paragraph 
to 91,92. Finally, if f takes on complex values, set gı = Re(f) and go = Im(f) 
and use the same argument. O 


Corollary 2.A.26. (Turán) If w(n), Q(n) is the number of prime factors of 
n without (respectively with) multiplicity, then 


X (w(k) — log log n)? = O(n log log n) 
k<n 


and 


N (Q(k) — log log n)? = O(n log log n). 


k<n 


8Which follow easily from the results of the addendum 3.A. 


2.A. Cauchy-Schwarz in Number Theory 87 


Proof. Use the Turadn-Kubilius inequality with f = w, respectively f = Q. 
Note that if f = w, then 


a 1 
E$ (n) = BF(n) =X z= log log n + O(1), 
psn 


as follows from Mertens’ theorem 3.A.5. 
Similar arguments apply for f = Q. O 


The following theorem is an immediate consequence of the previous result. 
Its original proof was much more intricate (see [39]). 


Theorem 2.A.27. (Hardy-Ramanujan) 
The functions w and Q have normal order loglogn, i.e. if f E€ {w, Q}, 
then for alle > 0 we have 


= 1 f(n) _ 
lim — = ]. 
T> T 


<1+e} 








n < r|l — e£ < —— 
log log x 


Proof. Let f € {w, Q} and let 


Ap={nsa 





ee > eh. 
log log x 


Since for any n € A, we have 
(f(n) — log log x)? > e? - (log log x)’, 
we deduce that 


e? - (log log x)” - |Az| < `S (f(n) — log log z)? < N (Fin) — log log z)?. 


nEÁrzr N<z 
Using the previous corollary, the result follows. O 


Here is a very beautiful consequence of the Hardy-Ramanujan theorem, 
which is surprisingly difficult to prove by other means: 


88 Chapter 2. Always Cauchy-Schwarz... 


Theorem 2.A.28. (Erdős) We have 


lim l{a-bll<a,b<n}| 


N— Oo n2 


= 0. 


Proof. By the previous theorem, there are n? + o(n”) pairs (a,b) with 1 < 
a,b < n and such that Q(ab) is about 2loglogn. The same theorem shows 
that n? + o(n?) numbers k € {1,2,...,n?} have Q(k) about 


log log n? = log log n + log2 << 2loglogn, 
so the number of numbers of the form ab (with 1 < a,b < n) must be o(n?). O 


The following difficult theorem (see [28]) refines considerably Hardy- 
Ramanujan’s result. It is also known as the fundamental theorem of prob- 
abilistic number theory. 


Theorem 2.A.29. (Erdés-Kac) Let f : {1,2,...} — R be a strongly additive 
function such that the sequence (f(p))p is bounded (p runs over the prime 
numbers) and such that limn+o By(n) = œ. Then for all real numbers a 


lim tfn < al f(n) ~ By(2) < ay/By(2)}| = -5 f - ody, 


Let us end this section and this addendum with another beautiful appli- 
cation of the Turan-Kubilius inequality, due to Elliott. We let vp(n) denote 
the exponent of the prime number p in the factorization of n. 


Theorem 2.A.30. There ezists an absolute constant C > 0 such that: 


1) For all x and alla, E€ C 


2 


dP | dL m- (1-2) Day < CrX anl’. 


pE<T n<r nír n<r 
Up(n)=k 


2.A. Cauchy-Schwarz in Number Theory 89 


2) For all x and all complex numbers an 
2 


Sop Yan = an < Cr- X janl’. 


pír J|n<zr P ics n<r 
pin 


Before looking at the proof, the reader is strongly advised to compare this 
theorem and theorem 2.A.15. 


Proof. We will prove only the first part, as the same argument yields the 
second part. Observe that Turan-Kubilius’s inequality can also be written in 
the form 


ty S fe") (tn — (i -— >) =) <C}. rent 


u 
k=1 |p“ <n p 


This suggests defining for j,k € [1, n] the quantity 


1\ 1 
e= (henw = (1-5) 5) vi 


if j = p” for some u and some prime p and ajk = 0 otherwise. Given any 
sequence Ņy1,..., Yn Of complex numbers, we can certainly define an additive 
map f such that f(p") = ./p“yp« for all u,p such that p“ < n. Then the 
previous inequality can be written 


$3 Saul <nC X lyp? < oY w 


k=1 }7=1 pt <n 


Applying the duality principle 2.A.10 and unwinding definitions, we obtain 
the desired inequality. O 


The previous theorem plays a key role in Hildebrand’s “elementary” proof 
(see [40]) of an extremely difficult theorem of Wirsing [85], [86], that confirmed 
a long-standing conjecture of Erdős and Wintner. 


90 Chapter 2. Always Cauchy-Schwarz... 


Theorem 2.A.31. (Wirsing) Let f : {1,2,...} — [-1,1] be a multiplicative 
function (i.e. f(ab) = f(a) f(b) whenever gcd(a,b) = 1). Then 


M(f) = lim = 7 f(n) 


n<r 


exists. Moreover, M(f) = 0 if 


To see how subtle this theorem is, take Möbius’ function for f: it is fairly 
elementary that )/,, 5 = 00, so the previous theorem yields } `< (7) = o(z). 
But it is elementary to prove that this is equivalent to the prime number 
theorem! Even though much easier than Wirsing’s original proof, Hildebrand’s 
proof is still rather technical and we refer the reader to his article [40] for 
details. 


Chapter 3 


Look at the Exponent 


3.1 Introduction 


If p is a prime, we define the p-adic valuation map vp : Z > NU {oo} by: 
vp(0) = œ and for n # 0 we have vp(n) = k if pë divides n, but p+! does 
not divide n. More concretely, for n # 0, vp(n) is the exponent of the prime 
number p in the prime factorization of n. The unique factorization of integers 
into powers of prime numbers easily yields 


Up(ab) = Up(a) + vp(b), Vpla +b) > min(vp(a), vp(b)), 


with equality if vp(a) # vp(b). The first property allows us to extend this map 
Up to Q by vp (Ẹ) = vp(a) — vp(b) for all nonzero integers a,b. It is an easy 
exercise to check that this is well-defined (i.e. if | = $, then we get the same 
value for vp (¢) and vp ($)). 

We will frequently use the following easy properties of the p-adic valuation: 


Up(gcd(r1,X2,.-.,In)) = min Up(Zi), 
Up(lem(z1, 22,...,2n)) = Max vp(zi). 


l<i<n 


92 Chapter 3. Look at the Exponent 


3.2 Local-global principle 


A very useful idea when dealing with divisibilities (or even equalities con- 
cerning arithmetic objects) is the local-global principle: if a and b are nonzero 
integers, then a divides b if and only if for all primes p we have vp(a) < vp(b). 
This “local-global principle” is the simplest of a series of such statements, 
concerning the way in which the arithmetic of integers is governed by the “be- 
havior at each prime.” Some of these statements are very deep and most of 
them are an active area of research. Here are some other such examples: a pos- 
itive integer is a perfect square if and only if it is a perfect square mod p for all 
primes p (this is already a quite serious result, using the quadratic reciprocity 
law). Another famous example is Hasse-Minkowski’s local-global principle: if 
a; are nonzero rational numbers, the equation ar? + azr? +e an TŽ = 0 
has nontrivial rational solutions if and only if it has nontrivial real solutions 
and nontrivial p-adic solutions for all primes p. This is a deep theorem and we 
refer the reader to the addendum concerning p-adic numbers for more details. 

We start with some easy applications of the local-global principle. 


1. Prove the identity 


Icm(a, b,c)? 7 gcd(a, b, c)? 
lcm(a, b) -lcm(b,c)-lem(c,a) gcd(a, b) - ged(b, c) - ged(c, a) 


for all positive integers a, b,c. 
USAMO 1972 


Proof. Fix any prime p and let xz = vp(a), y = vp(b) and z = vp(c). The p-adic 
valuation of the left-hand side is 


2max(z, y, 2) — max(z, y) — max(y, z) — max(z, 2), 
while that of the right-hand side is 
2min(z,y,z) — min(z, y) — min(y, z) — min(z, x). 


We claim that these two quantities are equal. Since the two expressions are 
symmetric in 7, y, z, we may very well assume that x > y > z. Then we need 
to prove that 2x — x — y — x = 2z — y — z — z, which is clear. O 


3.2. Local-global principle 93 


Remark 3.1. Another natural proof uses the identity 


LY 


emo) S gedag) 


and its analogue 


ryz gcd(z,y, z) 


lem(z, y, z) = —— 5H. 
7,9 2) = Fate, y) ecd(y, z) ecd(z, 2) 


Plugging in these values immediately yields the result. 


2. Let aj,@o,...,a, and bj, bg,...,b, be positive integers such that a; and 
b; are relatively prime for all i € {1,2,...,k}. Prove that 


J (37 aom akm 


am em. m = gcd(a1,a2,..., ak), 


where m = lem(b1, b2,..., bk). 


IMO 1974 Shortlist 


Proof. Fix a prime p and let x; = vp(ai) and y; = vp(bi). By hypothesis, we 
have min(z;,y;) = 0 for all 7 and we need to prove that if 


z = max(y1, Y2,- Yk), 
then 
min(zı — Yı +2, T2 — Y2 + 2,...,Lk — Yk +2) = min(z1,22,...,Zk). 


Note that x;-—y;+z > 2; for all i, so certainly the left-hand side of the previous 
inequality is at least the right-hand side. On the other hand, if z = 0, then this 
forces all y; to vanish and in this case the equality is clear. So, we may assume 
that there is some j such that y; = z > 0. Then z; = 0 and z; — yj +z = 0, 
making the equality clear again. O 


94 Chapter 3. Look at the Exponent 


The next problem is quite challenging and requires some preliminaries. 
A very common situation in analytic number theory is the following one: we 
have two maps f and g and we assume that g(n) = $ din f(d) for all n. Möbius 
found a very nice inversion formula, which expresses f in terms of g. Define 
the Möbius function u by u(1) = 1, a(n) = 0 if n is not squarefree and 
u(pip2:::Pe) = (—-1)* if py, po,..., pk are distinct prime numbers. Then the 
Mobius’ inversion formula reads f(n) = >> din H (5) g(d). The main ingredient 
in the proof is the fact that P odjn u(d) = 0 for all n > 1 and it equals 1 
for n = 1. This is immediate, since if p1, p2,...,pẹ are the different primes 
dividing n, then for n > 1 


> Hd) =1— Yat + X u(pip;) pe 
d\n t= i<j 

=l]- (o) + (3) — 

= (1-1) =0, 


by the binomial formula. Using this observation, we can write 


Su (5) 0 = So ag (5) 
d|n d|n 
=) a(d): X. f(a) 


d|n dyd|n 

= X f(d) X` a(d) 
dijn dla, 

= f(n). 


The next problem requires the multiplicative version of Mobius’s formula, 
namely if g(n) = []q, f(d), then f(n) = Tam g(a). The proof being 
exactly the same as above, we leave it to the reader to fill in the details. 


3. Let (an)n>ı be a sequence of positive integers such that 


gcd(am, Qn) = Aged(m,n) 


3.2. Local-global principle 95 


for all positive integers m,n. Prove that there exists a unique sequence 


of positive integers (bn)n>1ı such that a, = [| ba for all n. 
d|n 


Marcel Tena, Romanian TST 


Proof. In the light of the previous discussion, we are forced to define 
d 
bn = I] ain ) 
d|n 


Of course, the hard point is to prove that 6, is an integer. In order to prove 
this, we will first transform the expression defining bn, using the Mersenne 
QI aX? . 


property’ of the sequence (an)n. Namely, let n = pf p>? pp”, then 


bn = =— 
[az Į [a 


The key remark is that the Mersenne property yields the equality 





n 
PiPjPk 


gcd(a» )ier = a 


Mier Pi 
for all subsets I of {1,2,...,k}. Therefore we can write 
n 


[az []ecd(an,a2,a») 


Finally, the inclusion-exclusion principle coupled with the local-global princi- 
ple easily yields the formula 


BEZ [| ged(2i, £j, £k) 


ao onoo: --- = lem(z1, T2,..., Ln) 
[]j<; gcd(z;, zj) [[ gcd(xi, £j, £k, 21) 
for all nonzero integers z1, £2,..., £n. Using this observation, we deduce that 
a 
b, = — 7 
lcm fam,an,...,an| 
P1 P2 Pk 


1See chapter 10, the introduction to problem 10. 


96 Chapter 3. Look at the Exponent 


which is clearly an integer (by the Mersenne property, if m divides n then am 
divides an). The result follows. O 


Proof. Let p be a prime. Note that if a sequence (an)n>1 has the hypothesized 
property, then so does af, = p’?(¢m). Further, if the desired conclusion holds 
for all such a), then it clearly holds for am. Thus we may assume am = p”™ 
and we see that the sequence (Un)n>1 consists of nonnegative integers and 
satisfies min(Um, Un) = Uged(m,n): 

Let 0 < rı < r2 < --- be the values actually taken on by the sequence 
(Un)n>1- For any two elements m and m’ of the set Sk = {m: Um = Tk} we 
have Uged(m,m’) = Min(Um, Um’) = rg and hence gcd(m, m’) € Sk. Therefore Sy 
has a unique minimal element mg and all elements of S; are multiples of mx. 
Note that if j > k, then min(um,,Um,) = min(rj, rk) = rk, so gcd(m;, Mmg) = 
Mk Or mMz|m;. Thus in the sequence 1 = m1, m2,mM3,... each term divides 
the later terms. Now suppose m,z|m and mg4, { m. Then gced(mk, m) = 
Mk SO Min(Um,,Um) = Tk Or Um > Te. However gcd(mz41,m) < mp4) SO 
Min(Um,z41;,Um) < Tk+1 and hence um < Tk+1. Since the r; are all the values 
taken on by (un), we conclude that Um = rz. 

To recap, the sequences rı < r2 < -:: and m1, mM2,... determine um 
uniquely. If k is the largest index for which m,z|m, then uz = rz. Define 
bi = p", bm, = p'® "*-! for k > 2, and bm = 1 if m is not one of the mx. 
Then we immediately see that if k is the largest index for which m,|m, then 


k 


k 
[[ba = [ [om = p | [p7 = p = p' = an. o 
1=1 


d|m 1=2 


3.3 Legendre’s formula 


Legendre discovered a very beautiful and useful formula for vp(n!): 


vln) =Y = _ R= p(n) 


j>1 p-i 


where sp(n) is the sum of digits of n when written in base p. The proof of 


z | — [se | positive integers x < n 


the first equality is easy, since there are = 


3.8. Legendre’s formula 97 


such that vp(x) = j. The second equality is a direct computation left to the 
reader. The purpose of this section is to present some nice applications of this 
formula. 

We present two solutions for the next problem: one uses the local-global 
principle combined with Legendre’s formula, while the second one is more 
exotic, but very powerful. 


4. Show that if n is a positive integer and a and b are integers, then 
1 
—a(a + b)(a + 2b)---(a+(n—1)b)b"™ € Z. 
n! 


IMO 1985 Shortlist 
Proof. We will prove that for any prime p < n we have 
Up(n!) < up(a(a +b): (a + (n -— 1)b)b"—"), 


which is enough to solve the problem. If p divides b, things are rather clear, as 
u(n!) < n—1 while vp(b""!) > n—1. So assume that p does not divide b. But 
then there are at least [n/p] multiples of p among a,a+b,...,a+(n—1)b (since 
among a,a +b,...,a + (p — 1)b there is at least one multiple of p, the same 
for a + pb,...,a + (2p — 1)b,...), at least [n/p?] multiples of p? and so on. So, 
the p-adic valuation of a(a +b) --: (a+ (n — 1)b) is at least [n/p] + [n/p] +- 
and this is exactly the p-adic valuation of n!, by Legendre’s formula. E 


Proof. Consider the matrix A = {ai;j}i<ij<n where aij = (ory We 
claim that we have 


n(n—1) 





det(A) = m -ala +b): -- (a+ (n — 1)b) 


To prove this, note that one can evaluate any determinant of the form 


Pi(xı) Pi(z2) Pi (£n) 
P(xı) Pz(x2) Pz(£n) 


P(x) Pa(a2)  Pa(zn) 


98 Chapter 3. Look at the Exponent 


where P; are polynomials of degree at most n — 1, by multiplying the matrix 
of the coefficients of the P,’s by the Vandermonde matrix (E ij and then 
taking the determinant. Using this observation, it is easy to prove the above 
formula. 

Now, since det(A) is an integer (as it is a polynomial expression with 
integer coefficients in the entries of A), it follows that for any p relatively 
prime to b, the p-adic valuation of a(a+ b)--- (a + (n — 1)b) is at least that 
of n!. As for primes dividing b the problem is easy (since vp(n!) < n — 1), the 
result follows. Ll 


5. Prove that for all integers a,b with b Æ 0 there exists a positive integer 
n such that vg(n!) =a (mod b). 


KoMaL 


Proof. If sə(n) is the sum of digits of n when written in base 2, we need to 
find n such that n — s9(n) =a (mod b). Choose n = 27! + 272 +---+27* with 
Ly < T2 <: < Tk. Then so(n) = k and we need to ensure that 


27) —]4+.---+2°*-1l=a_ (modb). 


Write b = 2"c with odd c and choose r < zi < z2 < -:: < £k such that 
x; = 1 (mod y(c)). Then 27: — 1 = 1 (mod c) and so 


271 —]4.---4+2°* -l=k (modo). 


Also, we have 
27) — 1+. +27 -1=-—k (mod 2"). 


Thus, it is enough to choose k such that k = a (mod c) and k = —a (mod 2”), 
which is possible by the Chinese Remainder Theorem. Ll 


Remark 3.2. This problem admits a vast generalization, that appeared on the 
IMO Shortlist 2007. Namely, we can prove that for any positive integer d and 
positive integers b1,b2,...,bm there exist infinitely many positive integers n 
such that d|n — sẹ (n) for all i. Here s(n) is the sum of digits of n when 
written in base b. 


3.8. Legendre’s formula 99 


Proof. Let cy(n) be the number of digits of n in base b and consider the 
sequence defined by 


1+max)<icm cp, (a;) 
ag = db,bo-:-- bm, Aj+1 = Ap ~ ? . 


It is trivial to check that d divides a; for all ¿ and that 
Sb; (Gi, + Qig Tet Qi, ) = Sb; (Qi, ) + Sb; (Giz) Tee Sb; (Qi) 


for any distinct numbers 721, 72,..., 2. 
Since the m-tuples 


Si = (Sa, (ai) (mod d), Sb (ai) (mod d),...,Sb„(ai) (mod d)) 


take only finitely many values, the pigeonhole principle yields the existence 
of an m-tuple S that repeats infinitely often, say S = Sip = Si =+- for an 
infinite increasing sequence tọ, t1,.... 

Let 


Ck = Qigk T Qiarpi T'O OF Qika- 


We will prove that for all k we have d|ck — s»; (cx) for all j, which will end 
the proof. Since clearly d|c,, it remains to check that d divides sp, (c), which 
follows from 


Sb; (ck) = 5b; (diar) + 5b; (Qigns1) +++ + Soj (Qign4a-1) 


and 
Sbj (Gian) = $b; (Giaegi) =t = $0; (iara) (mod d). O 


A trickier combination of the local-global principle and Legendre’s formula 
can be found in the following problem, which implies the classical inequality 
Icm(1,2,...,n) > 2-1. This nice observation as well as the problem ap- 
peared in [31]. Amusingly, the same result appeared much before [31] in the 
same journal, in the form of a proposed problem! There is also an elemen- 
tary, but more difficult proof of the fact that lem(1,2,...,n) < 3” for all n, 
see for instance [38]. Using the prime number theorem, one can prove that 
Iem(1,2,...,7) behaves like e”. 


100 Chapter 3. Look at the Exponent 


6. Prove the identity 


(n +1) om ( (5); (Pe (")) =lem(1,2,...,n +1) 


for any positive integer n. 


Peter L. Montgomery, AMM E 2686 


Proof. We will prove that for any prime p the expressions in the two sides 
have the same p-adic valuation. Note that 


(n + (7) = (i+ yw rr), 


Let k denote the number of times p divides the right-hand side of the equality 
from the problem statement, so p* <n+1 < pët!. Taking i = pë — 1, we 


obtain that 
Up (o +1) (")) > Up(t+1) =k, 


which shows that the p-adic valuation of the left-hand side is at least k. The 
opposite inequality is more delicate. Fix 0 < i < n and note that Legendre’s 
formula gives 


(I) Ee EEE os 


Now, since 











[a + b] — [a] — [b] = [{a} + {b}] € {0,1} 


for any real numbers a,b, we have x, € {0,1} for all r. Moreover, we have 
x, = 0 if r > k, since in this case p” > n+ 1. The crucial point is that for all 
r < vpli +1) we also have z, = 0. Indeed, writing 7+ 1 = p"u for some integer 


u, we have 
n+1 n+1 
Tr = p" —u— p" — u = 0. 








3.8. Legendre’s formula 101 


Putting these remarks together yields the inequality 


Sar < k — v(i +1), 


r>1 


which, combined with (3.1), yields the estimate 


Up (ero (7t) < k, 


establishing the opposite inequality. Ll 


We continue with a rather tricky diophantine equation. 


2007 _ ,,2007 


7. Solve in positive integers x y = z!—- yl. 


Romania TST 2007 


Proof. We claim that there are only trivial solutions x = y. Suppose that 
x > y is a solution of the problem. We will distinguish two cases. 

In the first case, assume that y < 2007. If y = 1, then 12°’ = z! and 
trivially x = 1 (otherwise x — 1 would divide 279°", so x — 1 = 1, which is 
clearly not possible). Thus y > 1 and we may choose a prime ply. Then p 
divides y7°°" z!, y!, so that p|z. But then 


2007 < vp (x27 — y2007) = up(y!(a!/y! — 1)) = vply!) < y, 


a contradiction. 

Now, assume that y > 2007. Then z—y is a multiple of any prime p smaller 
than 2007 such that (2007,p — 1) = 1. Indeed, if p is such a prime, then p 
divides 7200 — 42007 — z! — y!. If p divides x, then clearly it also divides y and 
so it divides z — y. If not, since p divides x?~! — y?—! and gced(2007, p— 1) = 1, 
it follows that p divides x — y. We deduce that x > y + 2007 in this case. But 
then 


a! — y! = y!(a!/y! — 1) > 2007! - (x — 1)--+ (x — 2006) > 27°", 


again a contradiction. LJ 


102 Chapter 3. Look at the Exponent 


The next problem is also tricky. 


8. Prove that for all positive integers n different from 3 and 5, n! is divisible 
by the number of its positive divisors. 


Paul Erdős, Miklos Schweitzer Competition 


Proof. Since the number of positive divisors of n! is precisely |],<,(1+vp(n!)), 
it suffices to prove the following 


Lemma 3.3. Let P, be the set of prime numbers less than or equal to n. For 
all n Æ 3,5,7 there exists an injective map f : Pa > {1,2,...,n} such that 
1+uv,(n!) divides f(p) for all p E€ Ph. 


Indeed, if this is true, then ]],<,,(1 + vp(n!)) is a divisor of Į Jpep, (p), 
which divides n!, because f is injective. Thus the problem is solved for n # 
3,5, 7. For n = 7, we can check by hand that the result holds. 

It remains to construct f. For p < yn simply choose f(p) = 1 + vp(n!). 
Note that f(p) < n follows from Legendre’s formula and for p < q < yn 


we have Fa > B for all 7, the inequality being strict for 7 = 1 (because 


a q > og > 1). For the other primes, we will define f by induction. Assume 
we defined f for all primes q < p, where p € P,,p > yn is given. There are 


p 
If we manage to prove that this number is greater than 7(p) — 1, which is the 


number of occupied values for f(p), we are done, since we can take for f(p) 
any of the remaining values. However, note that m(p) < PEL, with equality 


multiples of 1 + vp(n!) (we used the fact that p > yn to obtain vp(n!) = J ). 


only for p = 3,5,7. Let us evaluate ial Write n = kp + r for some 
fe 
P 


0 <r <p, so that 
n pas 


_ r=P 
n k+1 
1+ [2] 


r=p P 
-p+ || >p-1 >p-1- 2. 
p+ P- TRAP i+k 


3.8. Legendre’s formula 103 


So, for k > 2 we have 


and we are done. For k = 1 it remains to see if we can ensure that [3] > PS. 
But this trivially holds for n > p. So, the only obstruction is n = p and 
T(p) = PHL, As we observed, this only happens for n = 3,5,7, which is 
excluded by the hypothesis. CI 


For the next problem, we will need some basic estimates about prime 
numbers. We refer the reader to the addendum 3.A for proofs and more 
details. 


9. Let n,k be positive integers such that n > 9*. Prove that C. ) has at 
least k distinct prime factors. 


Paul Erdos, Miklos Schweitzer Competition 


Proof. We will prove that for n large enough, (3) is a multiple of a product 
of k numbers that are pairwise relatively prime and greater than 1. This will 
clearly imply that it has at least k prime factors. Define Lg = lem(1, 2,..., k). 
The key point of the proof is the following 


Lemma 3.4. Forn > k+ Lp, (2) is a multiple of Th o Hn: Zp 


If this happens, we are done, because the numbers dna) Ex) ICETA] FLR) 


are pairwise relatively prime and greater than 1. 
To prove the lemma, we will compare p-adic valuations for all primes p. 
Note that the lemma is equivalent to 


I] gcd(i, Lx). 


t=n—k+1 
Thus, we need to prove that for all primes p we have 


n 


Up(k!) < X`  up(ged(é, Le). (3.2) 


i=n—k+1 


104 Chapter 3. Look at the Exponent 


Let r = vp(Lx) = [log, k]. For all i < r we can find at least Fa multiples of 


p? among the k consecutive numbers n—k+1,n—k+4+2,...,n. Also, if u is a 
multiple of p’ with 7 < r, then so is gcd(Lx, u). We conclude that (3.2) follows 
from Legendre’s formula, as the only nonzero terms on the left-hand side are 


of the form Fa with 2 <r. 
Finally, it remains to prove that k + L < 9*. Note that 


L, = [[ p” — I] p: I] plese F] < 4h. I] k 


PSk p>vk p<vk p<vk 
where here we used theorem 3.A.3. Thus 
Ly < KYE 4E < 9k, 


the last inequality being easy to prove. L 


3.4 Problems with combinatorial and 
valuation-theoretic aspects 


The problems in this section are fairly elementary, but none of them is 
easy. 


10. Let n > 2 and let aj,a9,...,a, be positive integers, not all of them 
equal. Prove that there are infinitely many prime numbers p for which 
there exists a positive integer k such that 


plat t+agt-.-+ar, 
Iranian Olympiad 2004 


Proof. By dividing all a;’s by their greatest common divisor, we may assume 
that they are relatively prime. Suppose that there are only finitely many 
primes p1, p2,..., py such that all prime factors of at + ak +-+ ak (where 
k varies over the positive integers) are among p1, po,..., PN. 


3.4. Problems with combinatorial and valuation-theoretic aspects 105 


Assume that among a), 4@2,...,@, there are b; numbers not divisible by 
pi. Since aj, a2,...,Qn are relatively prime, we have b; > 1. Consider 


N 
k=2 IC (npn) | 
i=1 


Since k > vp,(b;) + 1, we have p; vp; (bi) +) 


Euler’s theorem, we infer that 


aë whenever p;\a;. Using this and 
J P J g 


ay +a t-:-+ar =); (mod p; vp; (bi HDH 
Hence 
k k ky _ . 
Up;(ay +a3 +--+ + an) = Up, (bi) 
for all i. Since all prime factors of a¥ + ak +---+a* are among p1, p2,..., PN, 


we deduce that 


Up, (b1) porz (b2) J. paN On) 


av +as+---+ar = Pi 


Now, at least one of the a,’s is greater than 1, thus 


at +a +. ai > 2t > k > TP. 
i=1 


The two relations are clearly contradictory and the problem is solved. Ll 


A classical problem of Erdős is the following: if a1, a2,...,an+1 are dif- 
ferent positive integers not exceeding 2n, then one can find i Æ 7 such that a; 
divides a;. The idea is very simple and beautiful: the largest odd divisors of 
the a;’s form a sequence of n + 1 odd numbers between 1 and 2n — 1, so there 
must be two equal terms in this sequence. But then the corresponding a; and 
aj have a quotient which is a power of 2. The next problem is a variation on 
this classical gem. 


106 Chapter 3. Look at the Exponent 


11. Let f(n) be the maximum size of a subset of {1,2,...,n} which does 
not contain two distinct elements i,j such that 7/27. Prove that there 
exists a constant C > 0 such that for all n we have 


fn) -— > 


< Clnn. 








Paul Erdős, AMM E 3403 


Proof. We will actually exhibit the optimal set with the property that it does 
not contain two distinct elements 2,7 with i|2j. Defining 


n={aEZ| n/3<a<n, ve(a)=0 (mod 2)}, 


it is easy to see that A, satisfies the conditions of the problem. Indeed, if 
i,j E An and i|27, we must have i|j, since we cannot have vo(2) = və(j) +1 as 
both v9(i) and v2(j) are even. So j/i is either 1 or 2. It cannot be 2, because 
in this case one of v2(z), v2(j) would be odd, so it has to be 1 and i = 7. 

Next, we prove that this set is optimal, in the sense that it has the maximal 
number of elements among all sets satisfying the conditions of the problem. 
Take any such set A and fix k such that 3 does not divide k and v2(k) is even. 
Look at all the numbers k,3k,9k,... and 2k,6k,18k,.... There is at most 
one element of A among the union of these numbers and by definition there 
is exactly one element of A, among them. On the other hand, if one varies 
k, the numbers k, 3k,...,2k,6k,... form a partition of the positive integers. 
This clearly implies that A has at most as many elements as A,. 

Finally, we have to estimate the size of An. The elements of A, are e exactly 
the numbers of the form 4Jb with b odd, j > 0 such that 34 <b<q. There 
are approximately 37, odd numbers b such that zg <O< %, with, an error 
at most 2 if 47 < n and error at most 377; for 4) >n. The total error is thus 
logarithmic in n and since 
y= fn 
3-4) 9’ 


j20 
we have 


= 
|An| = — + O(log n), 
finishing the proof. Ll 


3.4. Problems with combinatorial and valuation-theoretic aspects 107 


12. Find all positive integers n with the following property: there exist 
natural numbers 0), bo,...,6n,, not all equal and such that the num- 
ber (bı + k)(bg +k)---(b; +k) is a power of an integer for each natural 
number k. Here, a power means a number of the form zY with z,y > 1. 


Russia 2008 


Proof. There are some obvious solutions: for instance, if n is composite, say 
n = ab with a,b > 1, then we can choose 6; = bg = --- = ba = 2 and all the 
other 6;’s equal to 1. Then for any k we have 


(b1 + k)(bo + k) +++ (bn + k) = (k + 2)°(k +120), 


which is certainly a power. 

So, the real question is to decide whether numbers b1, b2,..., bn can exist 
when n is a prime. It turns out that the answer is negative. Assume that such 
numbers existed and let c),c2,...,cn be the set of distinct numbers among 
by, b2,...,bn, with multiplicities m1, Mm2,..., mpn. By assumption, we have 
N > 1 and clearly n = mı +m +--+ my. Moreover, we know that for 
any k, the number (cı + k)™ (cg + k)'?---(cy + k)™% is a perfect power. 
The key point is to choose numbers k for which we can find distinct primes 
p1, P2,--- pÊ such that vp,(c; +k) = 1 if i = j and 0 otherwise. In this case, 
if 

(cy + k)™ (co + k)™? “(en + k)™N = 74 
for some z,y > 1, we have yup,(x) = mi, so that y divides all m;. But then 
y divides their sum, which is n and since n is a prime, it follows that n = y. 
Thus n = y will divide all m; and this obviously contradicts the fact that 
N > 1 and mi +m +: +4my =n. 


Thus, we are done if we can find distinct primes p1, p2,..., py and k such 
that vp, (c; +k) = 1 ifi = j and 0 otherwise. This is very simple: first, 
we choose some distinct prime numbers p1, p2,..., pyn sufficiently large, say 


not dividing any of the numbers c; — c; with 7 # j, and then choose k such 
that k + ci = p; (mod p?) for all i. Such k exists by the Chinese Remainder 
Theorem. Of course, vp (k + c) = 1 and for 7 # i we cannot have p,|c; + k, 
since otherwise p; would divide c; — c;, contradicting the choice of p;. Thus, 


108 Chapter 3. Look at the Exponent 


such k satisfies all desired conditions and the answer to the problem is that n 
must be composite. O 


A nice mixture of valuation-theoretic arguments and pigeonhole principle 
can be found in the following problem. 


13. Let a be a positive integer. Prove that the set of prime divisors of 27° +a 


for n = 1,2,... is infinite. 
Iranian TST 2009 
Proof. Assuming the contrary, let p1,p2,...,pn be such that all prime factors 
of 2° +a are among pj, p2,...,pyn for all n. Pick a large number r such that 


2 > a2"! +a and no such that 22"° +a > (pip2--- pn)". Then for all n > no 
we have 

Up; (22" +a) 
i 


? 


N 
(mp2 pn) <2” +a= ] |p 
i=1 
so that we can find 1 < i < N with Up, (27 +a) > r. This p; depends 
on the choice of n, but among the indices 7 associated to the numbers n = 
no +1,n9 +2,...,N9 +N +1 there will two identical ones. Thus we can write 
p127” +a and pt |2?""”" +a for some n > ng, some 1 < m < N +1 and some 
1<i< N. But then 22” = —a (mod pi), so that 22""" = a?” (mod pi) 
andp’|a?” +a. In particular, 


a taza" +a>p aP, 
contradicting the choice of r. The result follows. L 


We continue with two more unusual problems. 


14. Let p1, p2,..., Pk be distinct prime numbers and let S be the set of 
positive integers all of whose prime factors are among p1, p2,..., pk. If 
A is a finite set of integers, let G(A) be the graph whose set of vertices 
is A, two vertices a,b € A being connected if a — b € S. Is it true that 
for all m > 3 we can find A with m elements such that 


3.4. Problems with combinatorial and valuation-theoretic aspects 109 


a) the graph G(A) is complete? 
b) the graph G(A) is connected with all vertices of degree at most 2? 


Miklos Schweitzer, Competition 2009 


Proof. The answer to the first question is negative. Let p be the smallest prime 
different from p,p2,...,px and suppose that G(A) is complete for some finite 
set A with |A| > p. Then there exist a,b € A different such that p divides 
a — b, so that a and b arenot connected. 

On the other hand, the answer to the second question is positive! We will 


construct a set A with m elements a), a@2,...,@m such that G(A) is a simple 
path. It is enough to ensure that for all 1 < n < m, an41 — an E S and 
An+2 — An, An+3 — An,... are not in S. But we can choose 


Qn = p1P2 + Dk + (Pipa + Pk)” + +++ + (Pipa: + Pk)”. 


Then clearly an41 — an E S. On the other hand, for any i > 2 we have 
Up, (An4i — an) =n+1 for all 1 < j < k. Since an+i — an > (pip2---pe)"*", it 
follows that an4; — an is not in S and the result follows. C 


15. Let m and n be positive integers such that m + l,m +2,- m+n 
are composite numbers and m > n”~!. Prove that we can find pairwise 
distinct prime numbers p1, p2,...,Pn such that p; divides m +7 for all 
l<i<n. 


Tuymaada Olympiad 2004 


Proof. First, we will deal with those 7 such that m +i has at most n— 1 prime 
factors. For such 7 choose a prime p; for which p” i(m+i) is maximal (note that 


p; is unique with this property). Since m +i > n"! and since m + i has at 


most n — 1 prime factors, we have p,” i(m+i) >n. We claim that if i # 7 and 
m+t,m-+ j have at most n — 1 prime factors, then p; Æ pj. Indeed, assume 
that p; = p; = p. Then min(pP(™+?) | pe(m+3)) divides m +2 and m + j and 
moreover is greater than n. But any common divisor of m + i and m + j 


divides 7 — i and so it is smaller than n. This shows that i +> p; is injective. 


110 Chapter 3. Look at the Exponent 


It is now easy to conclude: make a list with those numbers m + ¿i having 
at most n—1 prime factors. For each of them, the previous paragraph yields a 
prime factor p; and the associated p;’s are distinct. If all m + i are in the list, 
we are done, otherwise successively pick remaining numbers and choose one 
of their prime factors which is not among the p;’s or among primes previously 
selected. This is possible at every step, since any m + îi not in the list has at 
least n prime factors. LJ 


3.5 Lifting exponent lemma 


This section is devoted to some applications of the following useful result: 


Theorem 3.5. (Lifting exponent lemma) Let p be an odd prime and let a 
and b be integers relatively prime to p, such that pla — b. Then for all positive 
integers n 

Upa” — b”) = up(n) + vpla — b). 


Proof. Consider first the case vp(n) = 0. We need to prove that p does not 
divide aE which is clear, as by hypothesis 





a” — b” 


a— b 





=a" 4a" 7b+---+0" 1! = na"! (mod p) 
and p does not divide na™~!. Next, we prove the result for n = p; we need 
to check that p divides exactly once into a?-! + --- +bP-1. Write b = a + pk 
for some integer k. Then by the binomial formula we have b = a’ + ia tpk 
(mod pĉ), so that 





ar- bp Po Pe 
5 = So ahh} = N (aP! + ipka”™?) 
a — 
i=0 i=0 


—] 
= pa?! + pk a? = pa?! (mod p°). 


Let us apply the case n = p to a”/P and b”/P (note that they still satisfy the 
hypotheses of the problem) to get vp(a” — b”) = 1+v,(a"/P —b"/P). The result 
follows now by an immediate induction on vp(n). L 


3.5. Lifting exponent lemma 111 


The reader might wonder what happens for p = 2. There is of course a 
version of the theorem for p = 2, but the formula is more complicated. The 
proof is however much easier. 


Theorem 3.6. Let x,y be odd integers and let n be an even positive integer. 


Then 


r? — y? 





mla =y") = v ( 2) Hoala) 


Proof. Write n = 25a for some odd number a. Then 
r” — y” = (x° — y*)(x* + y*)(x?4 4 y2%) o. (1a 4 ye), 
Now observe that if u,v are odd numbers, then u? +v? = 2 (mod 4). Thus 
v(x” — y”) = v9(27* — y**) +k- 1. 


Finally, since a, x,y are odd, it is easy to see that = is odd. The result 
follows. O 


An easy consequence of these results is the following estimate. It is much 
weaker than the previous theorems and can be proved directly by induction, 
too. 


Corollary 3.7. Ifa,b are integers and p is any prime dividing a — b, then for 
all n we have 
vp(a” — b”) > vpla — b) + vp(n). 
The next problem is an immediate consequence of this corollary. 
ac —b€ 


16. Let a,b,c be positive integers such that cļa° — b°. Prove that c| %55. 
I. Niven, AMM E 564 





Proof. First of all, note that ao is an integer, so the statement makes sense. 


ac —b° 


Let p be a prime dividing c. We will prove that up(c) < up ( Wat ). If p does 








not divide a — b, this follows from the hypothesis that c divides af — bf, so we 
may assume that p divides a — b. But then everything follows from corollary 
3.7. E 


112 Chapter 3. Look at the Exponent 


The next problem appeared as a Chinese TST 2009 problem, but the 
result is much older (see [61]). 


17. Let n be a positive integer and let a > b > 1 be integers such that b is 
odd and b"|a” — 1. Prove that a? > x 


M.B. Nathanson 
Proof. Take any prime factor p of b. Since b is odd, we have p > 2. Note that 
n < Up(b") < up(a" — 1) < vpla P-D” — 1) < vplaP™! — 1) + a(n). 


We deduce that 


and the result follows. CO 


Remark 3.8. Using a generalization of the deep Thue-Siegel theorem, Mahler 
proved in [48] the following result: if a,b, u,v are nonzero integers with u > 
v > 1, then there are only finitely many positive integers n such that 


au" =b (mod v”). 


It is quite rare to see two very similar problems at the IMO; nevertheless 
the following problem appeared in weaker forms at the IMO in 1990 and 1999. 


18. Find all primes p and all positive integers n such that n?—! divides 
(p— 1)” +1. 
After IMO 1990 and 1999 


Proof. Let p and n be as in the statement. Note that if p = 2, then n = 1 or 
n = 2. From now on, we assume that p > 2. If n is even, then 4 divides n?—!, 
but it does not divide (p — 1)” + 1, a contradiction. So, n is odd. Let q be the 
smallest prime factor of n. Since 


(p — 1)” = —1 and (p—1)?"' =1 (mod q), 


3.5. Lifting exponent lemma 113 


the facts that n is odd and gcd(n,q— 1) = 1 imply that p — 1 = —1 (mod q), 
or p =q. 
By the lifting exponent lemma (using that n is odd) we have 


(p — 1)vp(n) = vp(nP~") < vp((p — 1)" +1) = 1 + w(n). 
Thus (p — 2)vp(n) < 1. In particular, p = 3 and v,(n) = 1. Write n = 3a 
with gcd(a,3) = 1 and observe that a? divides 8° + 1. We claim that a = 1. 
Otherwise, let r be the smallest prime factor of a, so that 
8? = —1 and 87! =1 (mod r), 
whence 8 = —1 (mod r) as gcd(a,r — 1) = 1. But then r = 3, which is 
impossible. It follows that a = 1 and n = 3. E 


Remark 3.9. Another interesting problem that can be solved using the same 
ideas is the following one: find all positive integers a and b such that a? divides 
b — 1. 


19. Find all positive integers a, b,c such that 


(27 — 1)(3? — 1) = e!. 
Gabriel Dospinescu, Mathematical Reflections 


Proof. We will only give the key ideas, since the computations are a bit tedious. 
Take a solution (a,b,c) with c > 3 and note that a is even, since 3 divides 
221. Also, as 4 divides c!, 4 must divide 3° — 1 and so b is even. Then, using 
the lifting exponent lemma, we deduce that 


c — s3(c) 
2 


Similarly, by writing b = 2*r with r odd, we have 


= y3(c!) = v3(2% — 1) = v3(4%? — 1) = 1 + v(a). 


v9(3° — 1) = U2 Go — n) + V2(b) 


= vo(4(9"- 1 +97? +--+ 1)) + v9 (b) 
=2 + v2(b). 


114 Chapter 3. Look at the Exponent 


Thus 
c — 82(c) = v2(c!) = v2(3° — 1) = 2 + v2(b). 
We deduce that 








a2 gest) 2 35 —1—-log3(c)—1 _ 3372 
and C 
b> gv2(b) > gc—1-logz c-2 — 2°73 
From here it follows that C 
d> e ne n. 


It is not difficult (even if it is quite tedious) to check that this does not hold 
for any c > 12. We deduce that in any solution we have c < 11. We can easily 
exclude the cases c € {8,9,10,11}, since in this case we obtain v2(b) > 5, 
forcing b > 32, which is too large. For c € {4,5,6,7}, we have c! > 3° — 1 for 
only one value of b with v2(b) = c—s2(c)—2, so we simply check to see if there is 
a compatible value of a. Combining these with the obvious solutions for c < 3, 
we end up with (a, b,c) € {(1, 1, 2), (2,1, 3), (2, 2, 4), (4, 2, 5), (6, 4, 7)}. C 


20. Let a be a fixed positive integer. Prove that the equation n! = a? — a° 
has only finitely many solutions (n, b,c) in positive integers. 


Chinese TST 2004 


Proof. Let p be an odd prime not dividing a. Then by the lifting exponent 
lemma we have 


Taking n = b — c and noting that vp(n!) > F — 1, we conclude that 


Up(b — c) > up(n!) — v(a?! — 1) > "_K 

P 

for some constant K, independent of n. Letting « = p~* > 0, we conclude 
that b — c > ep”’ for all n. Thus 


— n/p 
n? >n! = al — al >a" > aP™”?, 


3.5. Lifting exponent lemma 115 


Taking logarithms, we deduce that n is bounded in terms of a. 
Since c,b — c < n!, the conclusion follows. E 


We continue with a very beautiful, but hard problem. 


21. Let x,y be relatively prime integers greater than 1. Prove that for in- 
finitely many primes p, the exponent of p in z?—! — y?—! is odd. 


Barry Powell, AMM E 2948 


Proof. Choose any integer k > 2. It is a well-known result of Fermat that 
the equations af + 64 = c? and af + bt = 2c? have only trivial solutions. 
Therefore x? + y? is neither a perfect square nor twice a perfect square, 
whence there is some odd prime p such that Vp (x? + y2) is odd. Since 
gcd(x,y) = 1, p cannot divide xy. Since p divides r — y?" and does not 
divide x?" — y?®™' , the order of z/y mod p is 2° and 2* divides p — 1. Hence 
the lifting exponent lemma gives 


Up(aP-) — yP!) = vpl? — y”) = wl +y) (mod 2). 


The result follows by taking successively k = 3,4,... and observing that the 
prime p associated to k in the previous discussion satisfies p > 1 + 2°. E 


Proof. We will argue by contradiction: assume that there exists N > |x — y| 
such that for all primes p > N we have 


u(r?! — yP!) =0 (mod 2). 


We claim that for all p > N the number oe is a perfect square. Take a 


prime factor q of a. Since q does not divide x (otherwise q divides y, 
contradicting the fact that gcd(z,y) = 1) and since q divides x? — y”, the 
order of x/y mod q is 1 or p. If q divides x — y, then q divides px?—! and so 
q = p, contradicting the fact that p > N > |x — y|. So the order of x/y mod 
q is p and so p divides q — 1. Then the lifting exponent lemma implies that 
Ug(aI— 1 — y1!) = vg (x? — yP) is an even number, finishing the proof of the 
claim. 


116 Chapter 3. Look at the Exponent 


Next, take a prime r = 3 (mod 4) dividing 4ry — 1 and (using Dirichlet’s 
theorem) take a prime p = —1 (mod r — 1) larger than N. Thus there exists 
z such that 


i.e. —l is a quadratic residue mod r, a contradiction. C 


3.6 p-adic techniques 


For more details on the results and techniques used in this section, the 
reader is (strongly) advised to read the addendum of this chapter, which is a 
modest introduction to p-adic numbers. Before passing to the next problem, 
let us recall the notion of congruence for rational numbers. Let p be a prime. 
The localization of Z with respect to the ideal pZ is the set 


a 
Zp) = (G E Q| gcd(a, b) = 1, gcd(b, p) = 1}. 


It is easy to check that Zp) is a subring of Q. If a,b € Zip), we write a = b 
(mod p’) if a—b € P : Zip). This is equivalent to the fact that the numerator 
of the fraction a — b, when written in lowest terms, is a multiple of p’. It is 
easy to see that this is an equivalence relation, satisfying the usual properties 
of congruences over Z. Moreover, we have a natural morphism f of rings 
Zp) > Z/pZ, sending | to a- b+, (6° being the inverse of b mod p). Then 
for x € Zp) we have x = 0 (mod p) if and only if f(x) = 0. This observation 
is very useful when trying to prove congruences for rational numbers. For 
instance, let p be a prime greater than 3 and consider the element 


p—1 1 
k=1 
of Zp): We have 





p—l1 p-1 
fle) =o = oe =0, 
k=1 k=1 


3.6. p-adic techniques 117 


since the map z > x7! is a bijection of (Z/pZ)* and 


5 12 = ple — 1) 


is a multiple of p. Thus the numerator of the fraction )>7_ R e , when written 
in lowest terms, is a multiple of p. The next problem is a variation on this 
idea combined with some ideas from p-adic analysis 
22. Let p > 5 be a prime. Prove that pt divides the numerator of the fraction 
1 
1 
+p: p2 
1 


T 


] 


NO 
z 3 
II | 
— 
on | ee 
= 
II 


when written in lowest terms. 
Gabriel Dospinescu 


Proof. The first step is to note that 
1 p- p 

2 — — E 

S: k -5 (E+; T) 2 k(p — k) 


Thus, it is enough to prove that 
p—1 
1 1 
— + —_ ]=0 d pł). 
Shta) =o oap’) 


k=1 
Now, the crucial remark is that in the field of p-adic numbers we have the 





convergent expansion 
1 1 1 1 p 
—~— = — 1 P 
kp- k) ke1—2 -phit ). 


118 Chapter 3. Look at the Exponent 


Of course, one does not need p-adic numbers to find or check the previous 
congruence, since obtaining the appropriate polynomials in p and k and vali- 
dating that they work are each formal algebraic matters. The p-adic approach 
simply gives a straightforward route to both, here. 

Using (3.3), it remains to prove that 


p—-1 1 p—l1 1 
qa tP za =0 (mod p°). 
k=1 k=1 
We will actually prove that 
p—l 1 p—l 1 
m = 0 (mod p°) 14 =0 (mod p). 
k=1 k=1 


= k*=0 (mod p), 


the last congruence being established either by using the existence of primitive 
roots mod p (which makes the corresponding sum the sum of a geometric 
progression with ratio gł, where g is a primitive root mod p) or simply by 
using explicit formulae for this kind of sum. In order to prove the other 
congruence, note that 


1 1 k? — k(p— k) + (p-k) _ -3 2 
BERT pp- kp Pe Mdp) 
SO 
~ 1 ~ 1 
>> = 5 i =0 (mod p’). 
k=1 k=1 
The result follows. L 


The following problem is a generalization of a classical congruence due to 
Fleck. For the reader not familiar with p-adic numbers, it is worth reading 
the addendum 3.B before attacking the proof. 


3.6. p-adic techniques 119 


23. Let p be a prime and n, s positive integers with n > s + 1. Prove that 


p? divides 
` —1)*ks 
(=1) (o) 


O0<k<n 
p|k 





— |n-s-l 
where d = | a] |. 


Gabriel Dospinescu, Mathematical Reflections 


Proof. Fix a primitive root of unity z € C of order p. The field K = Q(z) is 
an extension of degree p — 1 of Q, as the minimal polynomial of z is 


XPT! 4 XP? HXH. 


Choosing a prime (ideal) 8 above p in the ring of algebraic integers Ox of 
K and completing 8-adically, we obtain a valuation vp on K (seen inside its 


-adic completion), which extends the usual p-adic valuation on Q and such 
that vp(1 — z) = ae To prove the last equality, note that — 
Ox for all į relatively prime to p, so up(1 — z) = up(1 — 2°). Since we also have 


is a unit in 





p—1 


[[a-<) =p, 


the result follows. 
Note that 


if k is not a multiple of p and 1 otherwise. We deduce that 


5O (-1)*- H = LES G) 


O<k<n j=0 k=0 
p|k 


120 Chapter 8. Look at the Exponent 


Now, let n — s — 1 = d(p — 1) +r for some 0 < r < p— 1. We will prove that 


aO) 


for all 0 < j} < p— 1. This will imply that 


Up Syke (r) >d-1 


O0<k<n 
plk 


and since this p-adic valuation is an integer, the result will follow. 
Now, to prove the claim, we will use the following: 


Lemma 3.10. The polynomial X` o k (7. )X* is a multiple of (1+ X)"~* for 
alls <n. 


Proof. This is very easy: for s = 0 it is clear, and if 


Del ") xt = (1+ X)" f(X) 


then we get the inductive step by differentiating and multiplying both sides 
by X. LJ 


Coming back to the proof, write 


ye (o) X* = (14+ X)"-8f(X) 


for some f € Z|X] (note that we necessarily have f € Z[X], as (1+ X)"-* 
and “yo k°(Z)X * have integer coefficients and (1 + X)"~§ is monic). Then 
for? 1 < j < p we have 


?Note that by taking X = —1 in the previous relation we obtain Ņg_o(—1)*k°(?) = 0, 
so we only have to deal with 7 > 1. 


3.6. p-adic techniques 121 








and so 
jys) > = 8 _ gat +1 q 
» (Dah; > oa tt ot 
Thus, the claim is proved and the result follows. LJ 


We end this chapter with a wonderful result due to Skolem, Mahler and 
Lech. It truly shows what a versatile tool p-adic analysis can be. Even though 
the result still holds for p = 2, we will assume that p is odd in order to avoid 
some technical problems. 


24. a) Let p be an odd prime and let ao, a ,... be integers such that 
k n 
dP (|) a= 0 
k=0 


for infinitely many positive integers n. Prove that for all n we have 
that a, = 0. 


b) A sequence (an)n of integers satisfies 


Qn+d = T14n+d-1 + LQAn4d—-2 + `: + Lan 


for all n > 0, where d > 1 and 2, 20,...,2q are integers. Prove that 
there exists a finite set S and integers c1, C€2,..., CN, d1, d2,..., dN 
such that 


{fn > Ola, = 0} = SU (ci + d\N) U---U (cy + dnN). 
Skolem-Mahler-Lech theorem 


Proof. a) Consider the following function 
T 
fz) = So ptas(7) 
k>0 


Of course, such a series cannot converge as a series of real numbers, but it does 
converge as a series of p-adic numbers, since vp (p*ax(;)) > k for all £ € Zp 


122 Chapter 3. Look at the Exponent 


(note that (*) € Zp because the map x > (*) is continuous and takes values in 
Zp when z is in the dense subset Z of Zp). Thus f defines a map f : Zp > Zp. 
The crucial property of f is that it interpolates the sequence iar of (7) ak 
and that it converges to its Taylor series (has good analytic behavior). The 
first claim is obvious, since by definition we have 


= dP (o) ak 


The second claim is more delicate. Let us write 
£ 1 k 
—_ ml 
(a = tat 
j=0 


for some integers bjk. Then we can write 





f(x) = LS p" me X b;, pa = S c;z), 


k>0 j>0 


Pp andj “anb; k oor . . , . 
where cj = ) 45; . Note that the series defining c; converges, since 


kp, —2 


tends to co as k > oo. The same estimate shows that 


Up(c;) > inf k- pa = p2 

k>j p— p—1 p- 1 
(this is why we assumed that p > 2). The conclusion is that f converges to its 
Taylor series and so behaves as a holomorphic function, i.e. has good analytic 

properties. 

Now, by hypothesis we know that f(n) = 0 for infinitely many integers 
n. We want to prove that f = 0. This will imply that f(n) = 0 for all n and 
it is easy to see that this forces a, = 0 for all n. Since Zp is compact, we 


3.6. p-adic techniques 123 


deduce that there exists a € Zp and an infinite sequence of integers n; such 
that f(n;) = 0 and n; converges p-adically to a. Now, for all x € Zp we can 
write 


f(x) = 5 c((z-a) +a) 


j20 
“8 Si Ji aj) 
Ez Zoli)” k | (z-a)! 


Again, the series defining dk = )— G>k (Z) a)—* converges because Up(cj) — œ 


and we also have 9 
vp(dk) 2 inf vp(cj) 2 at 


Recall that f(n;) = 0 for all j. On the other hand 


N dg(nj — a)" | > vp(nj — a) > œ, 
k>1 


so that lim;_,o f(n,;) — do = 0. We deduce that dọ = 0. Dividing the equality 

f(n;) = 0 by a — n; and repeating the argument yields dı = 0, then dz = 0 

and so on. We deduce that all d;’s are zero and so f = 0. The result follows. 
b) We may assume that rg # 0. Consider the matrix M defined by 


Mij = lj=i+1 


for i < n and whose last row is £d, £a-1,--., £1. This is the companion ma- 
trix associated to the characteristic polynomial X? — 1;X¢"! — --- — aq of 
the recursive relation. Let V, be the column vector whose coordinates are 
Qn,Qn+4+1,---;4n4+d—-1- Then the recursive relation becomes Vn+1 = M Vn, thus 
V, = M” Wọ. If ei is the column vector whose coordinates are 1,0,0,...,0 and 
if (,) is the standard inner product in R, we deduce that an = (M"Vo,e1). It 


124 Chapter 3. Look at the Exponent 


is easy to check that det M equals xg up to a sign. Pick a prime p > 2 + |zal, 
so M is invertible mod p. Using either Lagrange’s theorem or the pigeonhole 
principle, we can find r > 1 such that M” = Ig (mod p). Thus we can write 
M" = I4 +pN for some matrix N with integral coefficients. It is then enough 
to prove that for any 0 < j < r — 1, the set Aj = {n > Olana;; = 0} is either 
finite or contains all nonnegative integers. In order to prove this, the point is 
to write 


n l n n 
andj = (Ua + PpN)"M?Vo,e1) = X. (pbe 
k=0 


where by = (N*MJVo, e1) is a sequence of integers. If A, is infinite, then part 
a) of the problem shows that A; contains all nonnegative integers and the 
result follows. g 


Remark 3.11. Under the hypothesis p > 2, the map f is analytic on the whole 
Zp. If p = 2, one can still prove (using for instance Amice’s theorem 3.B.30) 
that f is locally analytic on Zp. The proof is then exactly the same as in the 
case p > 2. 


Remark 3.12. The result holds for sequences with values in any field of char- 
acteristic 0, as Lech proved. The key point is that we have a p-adic version 
of the Lefschetz principle (the proof is not easy, but elementary, see [13] or 
[45]): if S is a finite subset of a field K which is finitely generated over Q, 
then for infinitely many primes p there is an embedding of K into Q, sending 
all elements of S to Zp. Applied to the roots of the characteristic polynomial 
of the recurrence relation, this reduces the proof to the p-adic case, which has 
already been discussed. 


Remark 3.13. The result does not hold for fields of positive characteristic. 
For instance, the sequence a, = (1 + t)” — 1 — t” is linearly recursive with 
values in F,((t)), but the reader can easily check that it vanishes precisely at 
{p"|n > 0}, which is not the union of a finite set and finitely many arithmetic 
progressions. 


3.7. Miscellaneous problems 125 


3.7 Miscellaneous problems 


Before attacking the following problem, let us recall aclassical property 
of the Fermat number F, = 2?” + 1, namely that any prime factor p of Fn 
satisfies p = 1 (mod 2”*!). Indeed, if p divides Fn, then p divides 22t L1, 
but does not divide 22" — 1. Thus the order of 2 mod p is 2”+! and since this 
order divides p — 1, the result follows. The next problem is a generalization of 
some very classical problems, such as the following one: find all integers n such 
that the congruence xy = 1 (mod n) implies the congruence x = y (mod n). 
It is easy to see that this is equivalent to n|z? — 1 for any x relatively prime 
to n and that this happens if and only if n divides 24. The next problem is a 
bit trickier. 


25. Let m be an integer greater than 1. Suppose that a positive integer n 
satisfies n|a™ — 1 for all integers a relatively prime to n. Prove that 
n < 4m(2™ — 1) and find all cases of equality. 


Gabriel Dospinescu, Marian Andronache, Romanian TST 2004 


Proof. Write n = 2*r with r odd. By the Chinese Remainder Theorem, there 
is an a with a = 3 (mod 2*) anda = 2 (mod r). Such an a is clearly relatively 
prime to n, so nla” — 1. Hence 2” = a™ = 1 (mod r) and 38" =a” = 1 
(mod 2%). Thus r|2™ — 1 and 2*|3” — 1. One easily checks that vo(3™ — 1) = 
2+ v2(m), hence 2*|4m. We conclude that n|4m(2™ — 1) and in particular 
n <4m(2™ — 1). 

Now suppose equality holds. The above argument actually shows that 
n<4.Qvelm). (2™ — 1) so for equality to hold we must have m = gv2(™) that 
is, m must be a power of 2. Suppose m = 2°. Then 


n= 254? (2? — 1) = 22.3.5. (27 41), 


that is, n is a power of 2 times some number of consecutive Fermat numbers. 
The cases s = 1 and s = 2 give two equality cases (m,n) = (2, 24) and (m,n) = 
(4,240). Suppose now that s > 3, and let p be a prime divisor of 227 4 1. 
The preliminary discussion shows that p = 1 (mod 2%), in particular p = 1 
(mod 8), so that by quadratic reciprocity, 2 is a square mod p. The Chinese 


126 Chapter 3. Look at the Exponent 


Remainder Theorem gives an a relatively prime to n with a? = 2 (mod p). 
Then nla” — 1, so g = a?” = a” = (mod p). But this contradicts the 
fact that 22°77 = —1 (mod p). Thus there are no solutions with s > 3. L 


Here is a beautiful problem, for which we present two elementary proofs, 
both ingenious, neither quite natural. In the addendum to this chapter, we 
will present a more natural proof which uses p-adic analysis. 


26. Let x, be the exponent of 2 in the prime factorization of the numerator 
2? 2” 
of I + > + +--+ —, when written in lowest terms. Prove that 
n 


lim £n = co 
n— OO 


and that ron > 2?” —n+1. 
Adapted from a Kvant problem 


Proof. The first proof is based on the following nice identity known as Staver’s 
identity, which also appeared as a problem in the USA TST 2000. 


Lemma 3.14. For any positive integer n we have 


n 4 n a n n “nti 22. Z 
0 1 n = gntl \y 2 n+1/)° 


Let us accept this lemma for a moment and see how we can conclude. 
Using the extension of v2 to all rational numbers, we can write 


n+1 ok n 
— > — — . 
„(9 e) >n+1—w(n+1) max o ((") ) 


k=1 








Since 
vz (©) = 59(j) + s2(n — j) — s2(n) < 2(1 + loga(n)) — 1 


and since v2(n+1) < loga(n+1), this trivially implies that £n — oo. Moreover, 
since each of the numbers (? .) is odd and since there are an even number 


3.7. Miscellaneous problems 127 


N_ . . 
of terms in the sum are oo it follows that zon > 2% — N +1, ending 


the proof of the second part of the problem. 

Now, let us prove the lemma. We will present two completely different 
approaches, the first one very down-to-earth, the second one more exotic, but 
more powerful also. 

Denote the left-hand side by a, and the right-hand side by bn. Clearly, 
ao = bo = 1. Note that 











n+tln (2 ?? 2” n+1 
On the other hand, 
n+ 4] 1 4 RELL 1-9) 
An— = — 
In T! Qn 4 (n — 1)! 
7=0 
1 I (GED n-i) iln- -i 
7 2 n:-(n—1)! 
1 LST GEDI 1- i)! + iln = i)! 
7 2 ni . 
(a (n\7 GS Ma 
HEC EC") 
i=l i=0 
= an 


Taking into account these two relations, it follows by induction that an = 
bn for all n and the lemma is proved. 
Now, let us turn to the second approach. We will use here the classical 


formula 

l a'b! 
(1 — t)°dt = ———__, 
J ( ) (a+b+1)! 


which can be proved separately for each value of a + 6 and then by induction 


128 Chapter 3. Look at the Exponent 


on say a and integration by parts. We deduce that 


1l, m+n! f tC — 4)"-Fadt 
Do fy Th Din 
1 ¿n+l _ (1 _ tnet? 
— 1 ——— dt. 
(n+ ) | von t 


Making the substitution 2t — 1 = s yields 


[ prt] _ (1 _ tnt! l [ (1 4 s\ntl _ (1 _ st 
O — 


2t—1 t= gn+2 s 


and noting that the integrand is an even function of s, the last expression is 


also equal to 
1 [ (ts) t -1 1 (I= sH F 
2n+1 0 s s 5 


1 . 1 \2 s 
-za Lf ((1+s)'+(1-s)')d 


l 5 in ae 
arth ea \ itl itl)? 


from where the result follows trivially. o 





Proof. This second elementary solution actually mimics the p-adic analytic 
proof, without mentioning p-adic numbers. Define the sequence of polynomials 


X xX’ X” 
La(X)= +p tt 


and Fa(X) = L,(2X — X?) — 2L,(X). Since 
F! (X) = 2(1 — X) L (2X — X*) — 2L! (X) 
— 2X1 — (2— X)") 
7 1- X 
= —2xX"(14+(2—-X)+---+(2—X)") 


3.7. Miscellaneous problems 129 


is of the form X"(ag+a,X +- --+an-1 X”7!) for some integers a; and moreover 
F,,(0) = 0, we deduce that we can write 





F,(X) = X" (e +X 4... 4 eh yr ‘). 


n+l n+2 2n 
Therefore, since L,(2) = —3F,(2), we can write 
9 92 qn n—l Mtk ag 
PT aT E nT Senge 


immediately implying the estimate 


v 2 Z 4,2 > in (n+k—va(n+k+1)) 
— — eee — ] — . 
? l 2 n L o<k<n-1 au 


As vo(n+k+1) < loga(n + k + 1) < logy(2n), the fact that £n — oo is now 
clear. The other inequality is also trivial using the previous estimates. LJ 


We continue with a rather technical problem. The solution we present 
is long and complicated, but nevertheless rather natural. This problem also 
appeared on the IMO 2007 Shortlist. 

27. Find the exponent of 2 in the prime factorization of the number 
gnt+l gn 
Cm) Ce): 
J. Desmong, W.R. Hastings, AMM E 2640 
Proof. Note that 


Con) 1-3: CEND _ GPF VO" +8) 2 +2" =) 
(ea) (1-3- (20 —1))2 1-3--+--(2"—1) 


130 Chapter 3. Look at the Exponent 


so that we need to compute 

y 2” (2" + 1)(2" +3) (2 +21) 

2 Qn-1 1-3... (2” — 1) 

= 1+ v9((2” + 1)(2" +3) (27 +2” —1)-1-3.-.---. (2” —1)). 
Looking at small values of n, we conjecture that for n > 2 

vo((2” + 1)(2” + 3)---(2% +2" —1) —1-3.-----(2"—1)) = 38n—-1. 
To prove this, we start by expanding the product 
(2” +1)(2" + 3)---(2" + 2" — 1) 


as if it were a polynomial in 2”. If we work mod 2°”, the only terms that 


count are the first three: if@=1-3----- (2” — 1), then 
gn-1l_y 1 
2” +1)(2%+3)---(2" +2" —1) =a+a-2” —— 
(2" +1)" +3). 2" +2" — 1 Sata” 9, y 
l 
+27" .a `S —— (mod 2°”), 


o<icjegn-1_1 (27 + 1)(27 +1) 


For simplicity, let 2; = =+ and y; = oT: Then 


27+1 
2S ri = 2" Ñ riyi, 


which combined with the previous observation yields 





vo((2” +1)(2 +3)--- (2 +2 —1)-1-3.-.---. (2” — 1)) 
27-1] 


= 2n — 1 +w ) Ziyi + 2 ) TiTi 
i=0 0<i<j<2n-!—] 


We will assume that n > 2. Then 


9n-1_] gn-l_y 
v2 ) zi | = v2 | 2"! Tiyi | Zn-l. 
7=0 


~ 
I 
© 


3.7. Miscellaneous problems 131 


SO 
9 qn-l_y 
— 2 — 2 n+l 
2 ) ne; = (> x; — X ri = — xz; (mod 2"*®). 
0<i<j<2r-!-] 1=0 
Since j 
2 — 2 n3 2n 
Tiyi = T; ——— = -ri — 2"r; (mod 2°"), 
1 — 2° 2; 
we obtain 
ar-l—] 


X Tiyi +2 ) LiL; 
1=0 0<i<j<2"-1-] 
2”—1—1 27—1—] 


= —2 `S r? — 2" `S r? (mod 2”+!). 
i=0 i=0 


Finally, we clearly have 


27”—-1—] 
2” r? =0 (mod 2”+!) 
i=0 
and 
27-1_—] | 
27(4" — 1 
i=0 


because the remainders mod 2” of the numbers zH are the same as the 
remainders of the numbers 2i + 1. Putting everything together, the result 
follows. C 


Remark 3.15. There are incredibly many congruences concerning binomial 
coefficients. A very good exercise for the reader is to prove the following 
theorem of Ljunggren from 1952: for any p > 5 and any integers a and b, the 
number (7) — (¢) is a multiple of pë. A more difficult result, due to Zieve 


(1999) is the fact that (3-1) — (3) is a multiple of p°” for any n > 1 and any 


integers a,b, if p is a prime greater than 3. 


132 Chapter 3. Look at the Exponent 


The last problem of the chapter is quite technical, but also very beautiful. 
A much weaker result was proposed as problem 3 at the IMO 2008. This kind 
of result is actually very old and classical. 


28. Prove that for any c > 0 there are infinitely many n such that the largest 
prime divisor of n? + 1 is greater than cn. 


Chebyshev, Nagell 


Proof. Let f(n) be the largest prime factor of P, = (17+1)(27+1)---(n?+1). 


We will prove that lim Ln) = oo. This is enough to deduce the desired result: 
noo 


indeed, choose any c > 0, then for sufficiently large n we have f(n) > cn. By 
definition there exists kn < n such that f(n)|k? + 1. Then the largest prime 
factor of k2 + 1 is greater than ckn and we are done (note that kn —> œ as 
n => oo, since f(n) divides k2 + 1). 

To prove the result on f(n), we will bound the p-adic valuation of prime 
factors of Pa. Let An be the set of prime numbers smaller than or equal to 
n and such that p = 1 (mod 4). Any prime factor of P, is in the set An241- 
The following lemma is the first crucial ingredient: 


Lemma 3.16. a) For any odd prime p we have 


2N 


Up( Pa) < log, (n? + 1) + p—1 


b) We have that va( Pa) = 2#]. 


Proof. The second part follows from the fact that the factors of P, alternate 2 
(mod 8) and 1 (mod 4). To prove a), let n; be the number of j € {1,2,...,n} 
such that p*|j? + 1. Clearly 


n Boers (n?+1)] 
Up(Pn) = Up(k? + 1) = 2 Thi. 
k=l 


Note that if p > 2 and p’ divides both j}?+1 and k?+1, then p’ divides j—k 
or j +k. Indeed, p’ divides (j — k)(j +k) and p cannot divide both j — k, j +k 


3.7. Miscellaneous problems 133 


as p does not divide j,k and p > 2. This being said, if jı is the smallest index 
with p*|j? + 1, then all k such that p*|k* + 1 are of the form jı + sp’ for some 
O<s< =r or of the form —j, + tp’ for some ah <t< me We easily 
deduce that ni <1 + a when p is odd and the result follows by adding up 


these estimates. C] 


Using the previous observations, we can write (cı > 0 is an absolute 
constant and a(n) is the number of prime numbers not exceeding n) 


2 logn! < log Ph 
= (Pp) logp + v2(Pp) log 2 





pSf(n) 
pEA, 2,4 
< ` log, (n 244) 42h logp+cn 
p p— 1 
pSf(n) 
pEA,2,4 
< 3logn-a(f(n)) + 2n `S SEP y 
p<f(n) 
PEA, 24) 


Now, corollary 3.A.2 shows that m(n) < C2 een for all n and some absolute 
constant c2 > 0. The crucial fact is that 


y og p _ og f(n) +01), 
p—1 2 

A|p—1 

p<f(n) 





a nontrivial result that follows from our proof of Dirichlet’s theorem (see ad- 
dendum 7.A for more details). Taking this into account and dividing the 
previous inequality by n logn yields 


9 log n! < 3c2f (N) 4 log f(n) 10 1 | 
nlogn ~ nlog f(n) logn logn 


Combining this with the fact that limp. logn: = |] easily yields what we 


n logn 
want here. C] 


134 Chapter 3. Look at the Exponent 


Remark 3.17. Nagell [60] actually proved the following general result: if f € 
Z[X | is not a product of linear factors with integer coefficients and if P(x) is the 
largest prime factor of f(1)f(2)--- f(x), then there exists c > 0 (depending on 
f) such that P(x) > cx log x. The proof consists in a refinement of the previous 
method, combined with rather deep estimates in analytic and algebraic number 
theory (more precisely, the prime ideal theorem). Erdős [27] proved that there 
exists c > 0 depending on f such that P(x) > z(log x)°!08 98 !>87_ It is a very 
deep theorem of Hooley [41] that there exists c > 0 such that for all x > 2 the 
largest prime factor of [],,-,,(n” + 1) is greater than cri. The proofs of these 
theorems are very involved. 


3.8 Notes 


We would like to thank the following people for providing solutions to 
some of the problems in this chapter: Xiangyi Huang (problems 8,26), Ofir 
Nachum (problem 10), Fedja Nazarov (problem 8), Richard Stong (problems 
3, 11, 20, 25, 27), Victor Wang (problems 8, 21), Gjergji Zaimi (problems 3, 
4, 25), Alex Zhu (problem 27). 


Addendum 3.A_ Classical Estimates on 
Prime Numbers 


Since we are using quite a lot of estimates about prime numbers in various 
places of this book, gathering these results in one addendum seemed more 
than appropriate. All results here are absolutely classical and go back to the 
beautiful ideas of Chebyshev, who was probably the first person to put some 
order in the chaotic world of prime numbers. These ideas were revisited by 
Erdos and their exposition is heavily influenced by this “update.” Basically, 
everything follows from some very smart applications of Legendre’s formula 
to middle binomial coefficients. 

For the reader’s convenience, let us recall some standard notation. We let 
m(n) be the number of primes less than or equal to n. If g is a map taking 
positive values when the argument is large enough and if f is any complex- 
valued map, we say that f = O(g) (respectively f = o(g)) if fa is bounded 
(respectively tends to 0) when x > oo. The crucial estimate that we will use 
when dealing with the behavior of m(n) is the following easy consequence of 
Legendre’s formula. 


Theorem 3.A.1. For any n > 2, (z) divides [lp<n pee"! and is a multiple 
z < 


of M54) <psnP- 


Proof. The second part follows immediately from the identity (note that n = 
n n+1 
[5] + [= D 
n , 
g= M 
2 [BE] <jsn 


and the fact that Lj J<p<n P divides the right-hand side and is relatively 
2 — 


prime to [z] l. For the first part, Legendre’s theorem yields 


CT bel Ps), 


136 Chapter 3. Look at the Exponent 


As for all a,b € R we have [a + b] — [a] — [b] € {0,1}, all terms in the sum are 
equal to 0 or 1 and all terms for j > log, n are zero. Thus vp (p) < [log, n] 
2 


and the result follows. 


The famous (and deep) prime number theorem asserts that z(n)dogn con- 


verges to 1 as n > oo. The following result gives a uniform lower bound 
estimate. Of course, it is weaker than the prime number theorem, but it is 
rather amazing that with so few tools it already gives the “correct” lower 
bound. Note that log 2 is approximately 0.69. 


Corollary 3.A.2. For all n > 2 we have 


log 2 
6 log 9" > m(n) > a 
logn 2 logn 








Proof. Using theorem 3.A.1 for N = (“"), we obtain 


log N = ` vp(N) log p < ` llog,,(2n)] log p < (2n) log(2n). 
po2n p<2n 


n 





Next, N is the largest among the (7) and 5°, (7) = 4”, hence N > = 7 
We obtain low N on loe2 — loe? 
n(2n) > EA > 2208? = log(2n + 1) 
log(2n) log(2n) 





Using this inequality as well as 7(2n +1) > m(2n), a small computation yields 
the lower bound for n > 6 as in the original statement: it is easy to check 
directly the cases 2 < n < 5. 

Using theorem 3.A.1 again yields 


2n 
m(2n)—7(n) < | | n< <4’. 
n P > ( n ) 7 


n<p<2n 


which applied to n = 2* gives 


ok+1 
m(25+1) — m(2") < —. 


fe 


3.A. Classical Estimates on Prime Numbers 137 


Combined with the obvious inequality 7(2**!) < 2*, this implies the inequality 


(k + 1)(2*+!) — kr(2f) < (25t!) 4 2841 < 3.0%, 


3:2” 


which easily implies that 7(2”) < ==. Finally, we have 


91+ [loge n] 6n 


n(n) < m(gitlesenl) < 3 o 


=n < ——, 
1 + [logon] logan 


Before tackling the next result, we will prove a very elementary, yet pow- 
erful inequality due to Erdos: 


Theorem 3.A.3. For any n > 2 we have [],<,p< 4""'. 
Proof. The proof is by strong induction, the inequality being clear for n = 2. 
Suppose that it holds for all numbers smaller than or equal to n and let us 


prove that I]p<n 1P < 4”. If n+ 1 is even, this is clear, so suppose that 


n = 2k. Note that IIk+2<p<2k+1 P divides (PE, so it is certainly at most 
equal to (20t '). Combining this with the induction hypothesis for k gives 
2k+1 
= . < 4". < 4” 
I] e=I[2 || œ? ( ; ) <4", 
p<nt+l p<k+l1 k+2<p<2k+1 


the last inequality being a consequence of 


OL 2k +1 2k +1 2k +1 
9.48 = (1415H! =... >? Oo 
(+1) +( k Ja) t Zel k 


A slightly trickier argument yields the following beautiful theorem of Mertens. 


Theorem 3.A.4. We have 


`S log? _ logn + O(1). 


psn 





138 Chapter 3. Look at the Exponent 


Proof. We will use the prime factorization of n!. Legendre’s formula yields 


l 
k - (14 PE) < vln) < k 











p-1 log p p-1 


Multiplying this by log p and summing over p < n yields 


OEP 
-log | | p - z(n) ‘logn < logn!-nY_ < 0. 
pn p<n 





Using Erdos’ inequality I]p<n p < 4", the previous estimates on 7(n), and the 
inequalities nlogn > logn! > n(logn — 1) (the first one is obvious, the second 
one follows easily by induction using the inequality log (1 + 1) < 1) yields 


lo 
8log 2 > SEP logn > -—1. 
p—1 
pín 
The theorem follows from this estimate and the fact that the series 2p TE 
converges (since 1) < Ni if p is large enough). oO 


The next result is a simple application of Abel’s summation and of the 
previous theorem. 


Theorem 3.A.5. We have 


1 
S — = loglogn + O(1). 


psn 


Proof. Define an = bgn if n is a prime and 0 otherwise. By the previous 
theorem there is a bounded sequence rn such that 


Sn = 02 +: +a, = logn + rn. 


Thus 





1 Sk- Ska mno 1 1 
ge = een tS (ee gern) 
p <%= logk logn logk log(k +1) 


p<n k=2 


3.A. Classical Estimates on Prime Numbers 139 


The triangle inequality yields 


yn ita meen) = O), 


so it remains to prove that 


n—l1 
log k 
1 — ———_ ] = log 1 1). 
DAN T) oglogn + O(1) 


Note that for k > 2 


~ log(k+1) ~ log(k +1) t 


k+1 
0< / dt fi log k 
k  tlogt log(k + 1) 


B Jii (log(k + 1) — logt) dt 
o Jy tlogtlog(k + 1) 


log k 1 Jii dt 


Hence 


1 
< 
T k?(log 2)?’ 


where we have used 
log(k + 1) — logt < log(k + 1) — log(k) = log(1 + 1/k) < 1/k. 
Since the sum of the upper bounds converges, we have 
n—l 
log k "dt 
] — ——~—__] = — + O(1) = log] O(1). O 
© a) J flogt * (1) = loglogn + O(1) 


Remark 3.A.6. One can actually say much more and the true content of 
Mertens’ theorems is the following: there is a constant c such that 


1 1 
So - =logiogn +e+0 (z+ ) 
p log n 


psn 


140 Chapter 3. Look at the Exponent 
and if y = impo (1 + 5 ++ 1 — log n), then 


1 -7 1 
neJe) 
p logn log“ n 


pín 








The first estimate is not difficult and uses again Abel’s summation formula 
combined with basic integral calculus. The second estimate is much trickier. 


Addendum 3.B An Introduction to 
p-adic Numbers 


This rather long addendum is an introduction to the wonderful theory of 
p-adic numbers and their applications. This is a vast subject and the literature 
concerning it is huge, so that we cannot even properly scratch its surface. 
However, even a glimpse into the subject reveals amazing things... 


For a variety of reasons, reduction mod p is awfully insufficient. The best 
way to reduce modulo arbitrary powers of a prime p while still working in 
a reasonable algebraic (and especially analytic) context is using p-adic num- 
bers. Very roughly, a p-adic number is a kind of “analytic function of p” 
with coefficients taken mod p. So, p-adic numbers will be infinite expansions 
> k>>—00 app", where a, are integers between 0 and p — 1. This is a mixture 
of the classical idea of decimal expansion and the more analytic Taylor expan- 
sion of a nicely behaved complex-valued function around 0. The idea of seeing 
integers as analytic functions of primes is incredibly powerful and appears all 
the time in modern number theory. 


Though the best way to define the field of p-adic numbers is by completing 
Q with respect to the p-adic absolute value, we will introduce p-adic numbers 
algebraically and then develop their analytic properties. We think that this 
is a bit easier to digest. Next, we briefly discuss what happens when one 
takes a finite extension of the field of p-adic numbers, which gives us the 
opportunity to discuss valuations and absolute values from a more abstract 
(and useful) point of view. This discussion reveals a huge complete extension 
of Q,, denoted C, and called the field of complex p-adic numbers. Its mere 
existence has some amazing consequences, for instance a beautiful geometric 
result of Monsky, concerning tiling of squares by triangles of the same area. 


After discussing classical analogues of the exponential and logarithm 
maps, we focus on the p-adic analogue of the complex Gamma function. This 
is the most technical part of the addendum, but also the most rewarding. It 
requires a preliminary discussion of Mahler expansions and p-adic integration, 
which are rather complicated, but once the machine is sufficiently developed 
one can prove deep congruences in a quite straightforward way. We highly 


142 Chapter 3. Look at the Exponent 


recommend the wonderful books [13], [49] and [67] for a much more thorough 
treatment and many good examples. 


3.B.1 Arithmetic of p-adic integers and p-adic numbers 


A nice way (though maybe not the most intuitive) to see a p-adic integer 
is to understand it as a compatible system of residue classes mod p”, for all 
n. That is, following Hensel, a p-adic integer is a sequence (Zn)n>1, where 
each Tn is a residue class mod p”, say the class of an integer a, and where 
An4+1 = an (mod p”) for all n. This simply says that 7,4, maps to Z, under 
the natural map Z/p"t!Z — Z/p"Z. With this description, it is fairly clear 
that the set of p-adic integers becomes a ring, if we define the addition and 
multiplication component-wise (i.e. the sum of the sequences (Z,), and (¥,,)n 
is declared to be the sequence (£n + Yn)n, Similarly for multiplication). To 
avoid useless repetitions, let us give a name to such sequences: 


Definition 3.B.1. A sequence (Zn)n>1, In E€ Z/p"Z is called compatible if 
Int] = Zn (mod p”) for all n, where z,, € Z is any lifting of Tn. Let Zp be the 
ring (for the previously defined operations) of all compatible sequences and 
call it the ring of p-adic integers. 


Note that Z lives inside Zp, as any integer n can be thought of as the 
compatible sequence (n (mod p*)),. The map sending n to this sequence 
gives an injective ring morphism from Z to Zp. We always identify an integer 
and its image in Zp. 

It turns out that our new ring Z, has very nice properties, both alge- 
braically and topologically, making it by far easier to handle than Z (you 
might have not noticed, but Z is a very complicated object, actually...). We 
discussed quite a lot the p-adic valuation for integers and rational numbers 
and we will see that it naturally extends to Zp, making Zp a beautiful place to 
do analysis. However, we need some preliminaries in order to make this dream 
reality. The following result is crucial for the arithmetic of Zp and shows the 
big difference between Z and Zp as far as arithmetic is concerned. Recall that 
a unit in a ring is an element that has a multiplicative inverse in that ring. 


3.B. An Introduction to p-adic Numbers 143 


Theorem 3.B.2. Any nonzero p-adic integer x can be uniquely written 
r = pu for some nonnegative integer k and some unit u. 


Proof. Before going on to the proof, let us characterize the units of Zp. This 
will also play an important role in the proof of the theorem: 


Lemma 3.B.3. A compatible sequence (£n) defines a unit in Zp if and only 
if its first component is nonzero. 


Proof. One direction being obvious, let us assume that the first component is 
nonzero. By compatibility, all zn are relatively prime to p, thus their classes 
mod p” are invertible. Simply choose y, to be the inverse of x, mod p” 
and check that it forms a compatible sequence, which is the inverse of x by 
construction. E 


To prove uniqueness of the representation given in the theorem, we need 
the following easy 


Lemma 3.B.4. If x € Zp and p™z =Q then z = 0. 


Proof. By induction on m we may assume that m = 1. Next, write x as a 
compatible sequence (Zn)n and observe that the condition pr = 0 simply says 
that p?, = 0 in Z/p"Z. This means that p"—!|r, for each n > 1. But since 
p” divides rn41 — Zn, we see necessarily that p”|zn, so in fact all components 
of x are zero. LJ 


Assume now that z = p*u = plv for some u,v units and some nonnegative 
integers k,l. If k > l, lemma 3.B.4 yields p*~'u = v. As v is invertible (in 
other words a unit), we have a contradiction with lemma 3.B.3. Similarly, we 
cannot have k < l, so k = l. Applying lemma 3.B.4 once more, we get u = v, 
which proves the uniqueness part of the theorem. 

To prove the existence, write x as a compatible sequence and let m be 
the largest integer j such that x; = 0 (mod pi). Then Yn = E are integers, 
since by compatibility tp4m = £m = 0 (mod p™). Moreover, since zy is 
compatible, so is yn. Then by construction (and the compatibility of (Zn)n), 
the sequence yn defines a p-adic integer y such that py = x. We claim that 
y is a unit, which will finish the proof of the first part of the theorem. But 





144 Chapter 3. Look at the Exponent 


the first component of y, does not vanish, so the result follows from lemma 

3.B.3. LJ 
Here is an important consequence of the theorem: 

Corollary 3.B.5. Z, is an integral domain. In other words, if ab = 0 then 


a=0orb=0. 


Proof. This follows immediately from the previous theorem and the second 
lemma in its proof. LJ 


If R is an integral domain, one can define its field of fractions: formally, 
a 


it is the set of all symbols ; with a,b € R and b Æ 0, it being understood 
that we identify — and § if ad = bc. Addition and multiplication of fractions 
being done as in elementary school, this yields a field. Applying this to Zp, 
we obtain the field of p-adic numbers. 


a a 
Q, = {cla bE Zp, b £ 0} = tole E€ Zp n EN}, 
the second equality being a consequence of theorem 3.B.2. 


3.B.2 The p-adic valuation revisited 


We will give a more analytic flavor to Qp. by endowing it with an absolute 
value, which plays the same role as the usual absolute value on real numbers. 


Definition 3.B.6. Let x € Q, — {0} and write (according to theorem 3.B.2) 
x = p*u for a unique unit u and a unique integer k. Call k = vp(x) the p- 
adic valuation of x and |z|, = p~’?*) the p-adic absolute value of z. Define 
|O|p = 0. 


The following is an immediate consequence of the definition: 
Proposition 3.B.7. For all x.y E Qp we have 
|zylp = |xlp: lylp and |£ + ylp < max(|2[p.|ylp). 


with equality if |x|p Æ |ylp. Moreover, |: |p extends the p-adic absolute value 


on Q C Qp. 


3.B. An Introduction to p-adic Numbers 145 


Note that the inequality |x + yļp < max(|zlp, |y|p) satisfied by the p- 
adic absolute value is stronger than the usual triangle inequality for real or 
complex numbers. This has a whole variety of consequences, which make p- 
adic numbers a rather exotic object from a geometric point of view. However, 
this will not affect us, since we will deal with analytic aspects and for that it is 
enough to introduce a distance on Q,, a measure of how close numbers are to 
one another. The distance between x,y E€ Qp is defined by d(x,y) = |x — y|p. 
We can then define analytic objects as in the “real world” (i.e. in the world of 
real numbers, but we will leave it to you to decide whether that is really the real 
world...). For instance, there are obvious notions of convergent sequences, 
continuous functions, etc. Basically, any real-analytic object has a p-adic 
counterpart. Just to see how it works, let us give the definition of a convergent 
sequence: 


Definition 3.B.8. Say a sequence of p-adic numbers x, converges to a p-adic 
number a if |z, — a|p converges to 0 in the usual sense, that is for all N > 1 
there is no such that |z, — alp < 1/N for all n > no. 


Intuitively, the sequence x, converges to a if the difference £n — a is more 
and more divisible by p when n is large, that is if vp(£n — a) goes to infinity 
as n — oo. If you think of a p-adic number as a compatible sequence, this 
means that for any k, if n is large enough (depending on k) then the first k 
components of x, — a are zero. The following result is absolutely fundamental: 


Theorem 3.B.9. If xn E Qp converges to 0 then the series | „>g In converges 
in Qp. That is, the sequence whose general term is 19 +21 +: -+2n converges 
in Qp. 

Note that this is NOT true for real numbers (think about the harmonic 
series!). Also, note the following important consequence: a sequence £n E€ Qp 
converges if and only if £n — £n-ı tends to 0 in Qp, a fact that will be used a 
lot in subsequent sections. 


Proof. Write Sn = £o +21 +:--+2n, so that Sn — Sn-1 goes to 0. Note that we 
may assume that all x, are p-adic integers: indeed, since £n goes to 0, Tn is a 
p-adic integer for n large enough. Multiplying all £n by the same large power 
of p so that the first terms also become p-adic integers does not affect the 


146 Chapter 3. Look at the Exponent 


hypothesis or the conclusion. Next, write s; = (5;), 5;2,...) as a compatible 
sequence. Thinking of these sequences as infinite rows of some infinite matrix, 
the crucial fact is the following: 


Lemma 3.B.10. For any n there exists kn such that Sin = Sjn for alli, j > kn. 
That is, every column of this infinite matrix eventually becomes constant. 


Proof. Indeed, note that for i > 7 we have 
Up(Si — Sj) = Up(Xj41 +++ +24) > at Up(Tk) 


and the last quantity goes to infinity as 7 > oo. Thus for i > j large enough 
we have vp(si — s;) > n, which implies that Sin = Sjn- C 


This lemma gives us a candidate for the limit of the sequence sn: define 
the sequence a = (ā1,ã2,...), where ān is the common value of the elements 
Sin for i large enough (using the notations of the lemma we have ān = Sk„n). 
It is then easy to check that this sequence is compatible and defines a p-adic 
integer which is the limit of the sequence sp. LJ 


The following basic result will be used frequently below. 


Proposition 3.B.11. Z, is a compact subset of Qp. 


Proof. Consider any sequence zy of elements of Zp. Since the first component 
of x, (seen as a compatible sequence) takes only finitely many values, there 
exists a subsequence £y, (n) and an integer a) such that ry,(,) = @, (mod p) 
for all n. The same argument yields a subsequence Zy,(y9(n)) and an integer 
a2 such that Zy,(9(n)) = @2 (mod p?) for all n, etc. Considering 


p(n) = pilyal---Yn(n)) ++), 


we obtain a subsequence such that Zn) = ax (mod pë) for all n > k. It 
follows that (ap (mod p*)), is a compatible sequence, defining a p-adic integer 
a. By construction, we have limn-soo Ly(n) = a and the result follows. LJ 


3.B. An Introduction to p-adic Numbers 147 


Finally, let us give another fundamental property of p-adic integers, which 
shows that they are basically “formal power series in p” or “infinite base-p 
expansions.” 


Theorem 3.B.12. For any p-adic integer x there exists a unique sequence 
an E {0,1,...,p—1} such that 


OO 


zr = N anp”. 


n=0 


By definition, this equality means that the sequence whose general term 1s 
ao +ayp+-:--+anp” converges to x. Moreover, if an is the first nonzero term 
of this sequence, then vp(x) = n. 


Proof. If x is a p-adic integer, there exists a unique ag € {0,1,...,p — 1} 
such that x — ap € pZp. Indeed, it is clear that ap has to be (the lifting to 
{0,1,...,p — 1} of) the first term of the compatible sequence z. Using this 
remark, we deduce by induction that for any n there are unique do, @1,...,@n E 
{0,1,...,p—1} such that r—(agp+aip+-::+anp”) € p"*'Z,. But this implies 
that x = limpso(a9 + aip + --- + anp”). The rest is essentially immediate 
using lemma 3.B.4 and theorem 3.B.3. E 


So any p-adic number x can be uniquely written as a Laurent series 
r = přak 
k>-N 


for some N and some ax € {0,1,...,p — 1}. Moreover, we have the following 
nice criterion to establish when x € Q. The proof is a bit tricky. 


Proposition 3.B.13. The p-adic number 
r = `S přak 
k>-N 


is a rational number if and only if the sequence (ax), becomes periodic from a 
certain point. 


148 Chapter 3. Look at the Exponent 


Proof. It is immediate to check that if (a,), is eventually periodic, then z is 
rational (simply because p? + pe +--+. = oa in Q, for any a > 0). The 
amusing point is proving the converse. Multiplying «x by a power of p, we may 
assume that £ € Zp, say £ = Dops9 np". Write x = 4 for some relatively 
prime integers u and v and consider the sequence xz = >| js4 a;p’—*. Then 
clearly rz, = ak + P£k+41. AS XM = ZF is rational, it is clear that all x, are 
rational. But much more is true: we claim that we can find yk E Z such 
that |yx| < max(|u|,|v|) (using the ordinary absolute value!) and z, = *. 
Indeed, if this holds for xz, then we can take yp) = a (clearly |ypi1| < 
max(|u|,|v|); to see that yk+ı € Z, note that £k — a, E pZp, so p must divide 
yk — vak). Now, the sequence (yz), is a bounded sequence of integers, so we 
can find i < j such that y; = yj. Then t; = x; and by uniqueness (proved 
in the previous theorem) we must have aj41 = @j41,@i42 = @j+2,.... This 
finishes the proof. Ll 


The following is also absolutely crucial. It basically says that in many 
cases solving a polynomial equation in p-adic numbers is the same as solving 
it mod p, since any solution mod p will automatically lift to a compatible 
sequence of solutions mod p”. 


Theorem 3.B.14. (Hensel’s lemma) Let f € Zp|X] and let a € Zp be such 
that |f(a)|p < 1 and |f’(a)|p = 1. Then there erists unique b € Zp such that 
f(b) =0 and |b—al, < 1. 


Proof. We will prove by induction that one can find a sequence of p-adic 
integers an with ag = @,dn41 = Gn (mod p"*!) and Up(f(an)) => n+1. By 
the previous theorem, the sequence an will converge to a p-adic integer b and 
since Up(f(an)) >n+1 and f(an) converges to f(b), then f(b) = 0. To prove 
the existence of a sequence an, assume we constructed ao,...,an and search 
for Qn41 = an + p”t!b, for some p-adic integer bn. We need to ensure that 
flan + p"t!bn) =0 (mod p"t?). Since 


flan + pt dy) = f(an) + Dt bn f (an) (mod p"*?), 


it is enough to take bn such that f (an)+p”+!b, f'(an) = 0 (mod p”*?), which is 
possible as f'(an) is a unit (because an = ag (mod p) and f'(ag) isa unit). C 


3.B. An Introduction to p-adic Numbers 149 


3.B.3 Absolute values and their extensions 
Definitions and Ostrowski’s theorem 


Let us start with an easy observation: Q, is not algebraically closed. 
Indeed, the equation z? = p has no solution in Q,, since if x € Qp satisfies 
r? = p, then p~’ = |p|p = |z?|p = |a|2 and |2|p is of the form p~? for an integer 
a, a contradiction. Thus, it is meaningful to study finite extensions of Qp, as 
one is often interested in solving polynomial equations over Qp. It turns out 
that all finite extensions of Q, also have natural absolute values that extend 
the absolute value of Q,, though this is not trivial at all. It is thus better to 
abstract the situation, using the following 


Definition 3.B.15. 1. An absolute value on a field K isa map |- |: K > 
R* such that |x| = 0 if and only if z = 0, |zy| = |z|- |y| and |£ + y| < 
|x| + |y|. The absolute value is called non-archimedean if |x + y| < 
max(|z|,|y|). The absolute value is called trivial if |z| = 1 whenever 


x #0. 


2. Two absolute values are called equivalent if there exists c > 0 such that 
Iz|2 = |a|{. 


3. A valuation on a field K isa map v : K + RU {oo} such that u(x) = oo 
if and only if z = 0, v(xy) = v(a) + u(y) and v(x +y) > min(v(z), v(y)). 


It is clear that if | - | is an absolute value, then v(x) = — log |x| defines a 
archimedean absolute value on K, the ring of integers 
of K is by definition Ox = {x € K||x| < 1}. It is easy to check that this is 
a ring and that mg = {x € K]||z| < 1} is a maximal ideal of Ox. Thus 
kx = Og /mx is a field, called the residue field of K. 


It is clear that any non-archimedean absolute value is bounded by 1 on 
Z, but the nice and somewhat tricky fact is that the converse holds. Indeed, 





150 Chapter 3. Look at the Exponent 


if |n| < 1 for all n, then for all x,y and all n we can write 


3 (z) rk yk 


k=0 


|z +y” = |(a+y)"| = 








n 
< XC |c|*lyl"* < (n + 1) max(|æl, |yl)”. 
k=0 


Taking the nth root of this inequality and letting n > oo yields |x + y| < 
max(|z|, |y|), proving the claim. With these remarks being made, we are ready 
to prove the following beautiful result: 


Theorem 3.B.16. (Ostrowski) Any nontrivial norm on Q is equivalent to the 
p-adic absolute value for some prime p or to the usual absolute value. 


Proof. Suppose first that the absolute value |-| is non-archimedean. We know 
that |m/n| = |m|/|n| Æ 1 for some nonzero m,n € Z. Without loss of gener- 
ality, then, let |m| < 1, so that for some prime p|m we have |p| < 1. If q is 
another prime with |q| < 1, then we may find integers a and b with ap+bq = 1, 
and then 


1 = |1| = |ap + bg| < max(|ap], |bg|) < max((p], |g|) < 1, 


a contradiction. We conclude that |n| = |p|’), and so |- | is equivalent to 
the p-adic absolute value. 
The difficult case is when | - | is archimedean. We saw that in this case 


there is an integer n > 1 such that |n| > 1. Pick any such n and write for 
all x > 1 the number z in base n, say r = zo + zin +--+ grn. Note that 
k < log, x and that if Cn = maxi<j<n-1 |j|, then 


k+ 
J n 
i| < |zo| + lzilln] + --- + keellnl® < Cue 





l 
<A log, In] 
In| — 1 7 


for some constant A, independent of x. Applying this to x” for N large enough 
yields |z| < zo. Irl, 


3.B. An Introduction to p-adic Numbers 151 


Now, we claim that for any integer x > 1 we have |x| > 1. Indeed, if 
|z| < 1, by:writing n’ in base z and using the same argument as before, we 
deduce that 

In}? = |n?| < C(1 + log, n’). 
As |n| > 1, this is certainly not true for j large enough, proving the claim. 

We have, therefore, that for all z,n > 1 both |z| < z°&!"! and |n| < 
nicez |tl_ so that (e.g.) the first inequality is in fact an equality. This implies 
that log, |n] is a constant function of n > 1. Thus, there is d such that |n| = nf 
for all integers n > 1 and the result follows. LJ 


Extensions of absolute values 


We prove now the following fundamental and nontrivial theorem. The 
proof is pretty acrobatic and uses a nice mixture of analytic and algebraic 
arguments. 


Theorem 3.B.17. Let K be a finite extension of Qp. There is a unique 
extension of the absolute value on Q, to an absolute value on K. This absolute 
value is non archimedean and it is given by 


Iz] = T/1NK7Q,(2)\p 
ifx E K, where n = |K : Qp] and Nyxjg, is the norm. 


Proof. We prove uniqueness first. We claim that for any two absolute values 
|- |i, |-l2 on A that extend |- |p on Qp, one can find c1,c2 > O such that 
cilz|ı < |z|2 < cg|x|, for all r. Assume that this happens for a moment. Then 


aleli =clz"l) < |z|2 < cla, 


and taking nth roots and letting n — oo we get |x|; = |zl2, proving the 
uniqueness part. Now, to prove the claim, it is enough to prove the following: 
if €1,€2,...,€n is a basis of K over Q, and if we define 


= max x |æilp, 


3 TiĉCi 








152 Chapter 3. Look at the Exponent 


then there are cj,c2 > 0 such that ciļzloo < |x|) < c2|z|.. But clearly for 


n 
Tt = ` Lye; 
i=l 
we have 
n n 
zh < So lzilp leih < 5 ei) lelo 
i=] 1=1 


so we can take cp = J` |e;|;. Obtaining cı is more subtle. Let 
S = {x € K|] = 1}. 


If we equip K with the product topology induced from its vector-space struc- 
ture over Qp, then S is easily seen to be a bounded, closed subset of K, whence 
compact. Moreover the map x —> |z|; is continuous on S, as say 


ela — lylal < la — yh < cola - yloo. 


Because this map does not vanish on S. there is cı > 0 such that |x|; > cı for 
x E€ S. As any z E K can be scaled by an element of Q, to become an element 
of S, the claim is proved. 


Existence is harder. Defining |r| = {~/|Nx/g,(x)|p. standard properties 
of the norm yield |ay| = |z|- |y|. |z| = 0 & z = 0 and |x| = |z|, for z € Qp. 
The difficult point is proving that |x + y| < max(|z|, |y|). By multiplicativity, 
it is enough to prove that |z + 1| < max(1, |x|), which reduces to 


if |z| < 1, then |z +1] <1. 


This is however quite subtle. We will actually prove the following result: there 
exists c > 0 such that |x + 1| < c whenever |z| < 1. Assume that we proved 
this for a moment. Applying it to z/y or y/x (according to whether |y| > |x| 
or not), we deduce that for all x,y we have |x + y| < cmax(|z|,|y|). But then 


for all d we have 
d d d\ d-i i d 
lz + y|? = |(x + y)°| < c max JEY < c(max(|æ|, |y|))°. 


0<i<d 








3.B. An Introduction to p-adic Numbers 153 


Taking dth roots and letting d > oo, we obtain |r +y| < max(|z|, |y|), finishing 
the proof. 

It remains to prove the existence of c. This is similar to the first part 
of the proof. Namely, let e1,...,€n be a basis of K over Qp and let |- |oo be 
as above. As the norm of an element of Q, is a polynomial expression of the 
coordinates of that element in the basis e€1,..., en, it follows that x > |z| is 
a continuous map. Since it does not vanish on the compact set {z||z|. = 1}, 
there are positive numbers a,b such that a < |x| < b whenever |r|,, = 1. An 
obvious scaling argument implies that alrl.. < |z| < blz|. for all z, from 
where 


x b 
|1 +| < bll + zl% < b (1+ =) <b+- 
a a 
whenever |z| < 1. The existence of c is thus proved and we are done. O 


The uniqueness property in the previous theorem ensures that if K C L 
are two finite extensions of Q,, then the restriction to K of the unique absolute 
value on L extending that on Q, is the unique absolute value on K extending 
that on Q,. This implies the following very useful 


Corollary 3.B.18. Fiz an algebraic closure Q, of Qp. There is a unique 
extension of |- |p to a non archimedean absolute value on Qp. 


From Q, to C, 


We have a bad news: after all the hard work in the previous section, we 
have to tell you that Q, is not a very good object. When dealing with p-adic 
numbers, analysis is intensively used and finite extensions of Q, are very good 
places to do analysis because they are complete. This means that all Cauchy 
sequences converge in such a finite extension (this also happens in R or C 
or in a compact interval, but not in (0,1) for instance: the sequence 1/n is 
Cauchy, but does not converge to an element of (0,1)). On the other hand, 
Q, is not complete (this is not really easy to prove, actually, but those with a 
good analytic background will observe that it follows immediately from Baire’s 
lemma), so one cannot do reasonable analysis on this field. 


154 Chapter 3. Look at the Exponent 


Let us explain why finite extensions of Q, are complete, since this is very 
important. The fact that Q, is complete was essentially proved while proving 
theorem 3.B.9. To see that a finite extension K is complete for the unique 
absolute value |: | extending |- |p, choose a basis e€1,...,€, of K over Qp 
and define |z|.. = max;|z,| if x = $; xie; and x; E€ Qp. This is a norm 
on K (but not an absolute value) and the same argument as in the first 
paragraph of the proof theorem 3.B.17 shows that there exist c,,c2 > 0 such 
that cy|z| < |zloo < c|z| for all zx. Thus, the notions of Cauchy sequence and 


convergent sequence are the same for |- | and |- |... But it is clear (from the 
fact that Qp is complete) that K is complete for |- |oo, thus K is complete for 
| - |, too. 


Now, we would like to have a field that contains Q, and which is still 
complete. It turns out that there exists such a field which is moreover minimal. 
The situation is very similar to that of Q endowed with the p-adic absolute 
value: it is not complete for this absolute value, but adding all possible limits 
of all Cauchy sequences in Q one ends up with a much bigger field, Qp. One 
can play the same game starting with Q, and one ends up with a huge field 
Cp, endowed with an absolute value |: |p extending that on Qp and having the 
properties: 


1) The field C, is complete with respect to |: |p, that is any Cauchy sequence 
in C, (with respect to |- |p) converges in Cp. Just as in theorem 3.B.9, 
we deduce that if an € Cp is a sequence converging to 0, then )_ 5, Gn 
converges in Cp (the notion of convergence is defined just as for Qp). 


2) The field C, is algebraically closed, in particular it contains Q,. More- 
over, Q, is dense in Cp. 


3) The residue field of C, is an algebraic closure of Fp. 


Here is the way one constructs Cp: consider the set C of all Cauchy sequences 
in Q,. It is easy to check that this is actually a ring, addition and multipli- 
cation being defined component-wise. Next, one checks that the subset m of 
C consisting of sequences that converge to 0 is a maximal ideal of C. One 
defines Cp = C/m. By definition, this is a field. It has a natural absolute 
value |- |p, defined by: if x € C, is the class of a Cauchy sequence (an)n, then 


3.B. An Introduction to p-adic Numbers 155 


|Z|p = limn—+co |@n|p- One checks that this is well-defined (i.e. independent of 
the choice of the sequence (an)n) and it is an absolute value that extends the 
one on Q,. It is not difficult to prove that C, is complete for this absolute 
value and essentially by definition Q, is dense in Cp. It is more difficult to 
prove that C, is an algebraically closed field. 

The previous construction does not use any special property of Q, except 
the fact that it has an absolute value. In general, for any field K endowed 
with an absolute value, one can construct (in exactly the same way as above) 
a field K that contains K and has an absolute value |- | extending that of K, 
such that K is dense in K. This field is called the completion of K. If K is 
algebraically closed, then K is also algebraically closed, though this is not so 
easy to prove. 


A summary 


The upshot of this technical section is that finite extensions of Q, behave 
as Q, both algebraically and analytically, that Q, is a pretty bad field from 
the analytic point of view and that if one wants to do analysis, then one has to 
work with its completion Cp. Moreover, in Cp a series `, an converges if and 
only if an converges to 0 (which means that limp. |an|p = 0) and then one 
can permute its terms as one wishes and still get the same value of the sum 
(this is definitely wrong in real or complex analysis!). Finally, one can deal 
with double sums in a rather leisurely way, since if limmax(m,n)+00 @mn = 9, 


" gie) zE) 


and all series converge. Note that we did not prove the last two assertions, 
but since they are rather easy, we leave them as good exercises for the reader. 


3.B.4 p-adic analogues of classical functions 


Recall that for any complex number zx, the series )) 4.5 z converges to a 
complex number called e? and x > e7 is a surjective group morphism C > C*. 
Let us study the p-adic analogue of this construction: the problem is that 


156 Chapter 3. Look at the Exponent 


Up(n!) is quite large, so we cannot expect that the previous series converges 
for all x. Actually, by theorem 3.B.9 the previous series converges for some 
x € C, if and only if vp (= r) — oo. Using Legendre’s formula vp(n!) = z-e, 
where Sp(n) = O(log n) is the sum of digits of n when written in base p, we 


deduce that the series converges iff 


1 S 
lim n (el — —;) + ŝp(n) = 


n=. p—1 p-1 





which happens if and only if vp(x) > ST i.e. |x| <p pT, Moreover, one can 
easily check (using the remark on double sums made in the previous section) 
that if x.y satisfy these conditions, then so does x + y and ef -e¥ = et, 

It turns out that one can construct an inverse to the exponential map, 
which is however defined on all C,. More precisely, we have the following 
nontrivial 


Theorem 3.B.19. There exists a unique continuous homomorphism log, : 
Cy —> Cp such that log,(p) = 0 and 


log (2) = (ey 


— n 
for |x —1|p < 1. 


Proof. (sketch) The proof is pretty long, so we only give the main steps. The 
crucial point is the following 


Lemma 3.B.20. Any x E€ C} can be uniquely written x = p"-C-v for some 
r € Q, Ç a root of unity of order prime to p and v E€ C, such that |v — 1| < 1. 


Proof. Let us prove the existence part. By construction, vp(CZ) = Q, so that 
given any x € C} there is r € Q and u € C} such that x = p” -u and vp(u) = 0. 
Consider the image of u in the residue field F, of Cp. It is a nonzero element 
of some Fj for some power g of p. Thus Up(u’—! — 1) > 0 and then easily 
ulg-)a" _, 1 as n 3 œ. We see similarly that C = liM u?” converges and 
then 17! = 1 with vp(u — Ç) > 0. So one can take v = u/C. 


3.B. An Introduction to p-adic Numbers 157 


For uniqueness, it is clear that r = vp(z) is uniquely determined. It is 
thus enough to check that no root of unity ¢ Æ 1 of order prime to p satisfies 
|1 — Ç| < 1. If¢ has order n > 1, then the norm (from Q,(¢) to Qp) of 1- Ç 
is a divisor of n, and so cannot be a multiple of p. LJ 


Now, let us study log,. Let x € Cp and write z = p"-¢-v as in the 
lemma. Note that if we admit that log, exists, then necessarily N log,(¢) = 
log, (¢) = 0 if CY = 1, so necessarily log,(¢) = 0. As log,(p) = 0, we must 


have 
(v — 1)" 


n 


log, x = log,(v) = So(-1)"7 


n>1 


This shows that if log, exists, then it is unique. 


It is harder to prove existence. First, by the previous paragraph we must 
define 


log, = log, (v) = eee 


n. 
n>1 


if c = p’-¢-v. Note that the series converges, as 


Up (=) > nvp(v — 1) — log, (n) — œ. 


Moreover, since the series converges uniformly, it is easy to see that v > 
log,(v) is continuous for |v — 1| < 1. From here it is not difficult to check that 
x — log,(2) is continuous on C3. It remains to check that it is additive. This 
easily reduces to 


log, (1 + u) + log,(1 +v) = log,(1 + (u +v + uv)) 


for |u| < 1 and |v| < 1, which is the tricky point. First, one checks that as 
formal series in X,Y we have 


log(1 + X) +log(1 +Y) = log(1 +(X +Y+4+XY)), 


for instance by differentiating both sides in X, respectively Y. Next, the series 
defining log,(1 + u), log,(1 +v) and log,(1 +u +v + uv) converge absolutely 


158 Chapter 3. Look at the Exponent 


and one can permute their terms as one wants, without changing the value of 
the series. This implies that we can substitute X = u and Y = v in the formal 
series equality and finishes the proof of the theorem. E 


With the same arguments as in the proof of the previous theorem we 
DS S 
obtain log,(e”) = x if |z| < p P=! (it is easy to check that |e” — 1|p < 1 for 


such z) and e!°8(*) = z if x is close enough to 1 so that Up(log,(x)) > oT: 


We end this section with another useful p-adic analogue, the binomial 
functions and power functions. Define, for x € Qp and n > 0 


(*) _ a(e—1)---(e@-n+t)) 


n n! 


Proposition 3.B.21. 1) (Vandermonde’s identity) If x,y E Qp, then 


CURIOLGD] 


2) If x € Zp, then (F) € Zp for all n. 
3) Ifa E€ Cp satisfies |a|p < 1 and x € Zp, define 


(1 +a) = 2 (*) a”, 


Then the series converges and x — (1 +a)? is a continuous additive 
homomorphism from Zp to Cp. 


Proof. 1) If x,y are positive integers, simply compare coefficients in 
(14+7)7*¥=(14T)*-(1+T)’. 


The result then follows by density and continuity. The same argument works 
for 2). The convergence of the series in 3) follows immediately from 2) and 
theorem 3.B.9. The continuity follows from the uniform convergence of the 
series, while the equality (1 + a)” - (1 +a)” = (14 a)**¥ follows either by a 
simple computation using 1) or from the case x,y € {1,2,...} by continuity 
and density. Ll 


3.B. An Introduction to p-adic Numbers 159 


Some applications 


We discuss here some immediate applications of the preceding theoret- 
ical results. The reader will probably appreciate better the power of p-adic 
techniques, since none of the following problems are easy to solve by other 
means. 


Example 3.B.22. (Kiran Kedlaya, USA TST) Let p > 5 and let 


p—1 


1 
fo(z) => (pn tk? 


k=1 


Prove that for any integers z, y, p3 divides the numerator of f(x) — fp(y) when 
written in lowest terms. 


Proof. Using the tools previously introduced, this is very simple: working in 
Qp, we can write 


po 1 px \ —2 
holo) = Yoga (1+) 
k=1 
BL (SE 
= — — T 
k? j jki 
k=1  j>0 
7 k? k k? 
k=1 
p-1 1 p-1 1 p—1 1 
= td tre dG (mod p°). 
k=1 k=1 k=1 


It suffices thus to show that 
pvr 4 P- a] 
2 
PISE ad ph 
k=1 k=1 


but these congruences have already been discussed in the solution of problem 
22, chapter 3. L 


160 Chapter 3. Look at the Exponent 


Example 3.B.23. (how not to prove Fermat’s last theorem) Let p be a prime 
and let k, N > 1. There exist integers x,y,z, not all of them divisible by p 
and such that 2% +y = 2% (mod p"). 


Proof. It is enough to show the existence of x, z € Zp such that eV 4+1= 2%, 


since then z (mod p*),1 and z (mod pë) is a solution. We would like to take 
z=(14+2%)/% Using the results of the previous section, we are tempted 
to take z = oso (w)anN . Unfortunately, N is not necessarily prime to p, so 
we cannot apply directly those results. However, 


i 
Up (Je~) > Nnvp(x) — = i 7 NUp(N) 





t 


and this tends to oo as n > œ if Nvp(x) > =I +Up(.N). We thus choose such 
x and define z by the previous series. Then z € Zp (by the previous estimate) 
and the usual argument with formal series shows that z2" =1+4 2%. LJ 


The next problem has already been discussed in chapter 3, problem 26, 
where two rather difficult solutions were given. Using 2-adic numbers, it be- 
comes almost obvious. 


Example 3.B.24. Write 


22) p an 

l 2 n dbp 
for relatively prime integers an, bn. Then ve(an) > n — logs(n) (this is the 
ordinary logarithm here!) for n > 2. 


Proof. Let us work in Q2. The series 5), 2 suggests considering log,(—1). 
Indeed, the series defining this is exactly — }_, 2., On the other hand, since 
logs is additive and since (—1)? = 1 and log,(1) = 0, we must have log,(—1) = 
0, that is in Qo we have the equality )>., 2 = 0. But then 


n ok gk l 
v2 B =) = v2 (- `S =) > inf (k — logs k) > n — loga (n). O 


k=1 k>n 


3.B. An Introduction to p-adic Numbers 161 


3.B.5 <A geometric application 


In this section we reward the reader with a mathematical gem, due to Paul 
Monsky. This uses the existence of an absolute value on C, extending the one 
on Q,, a result which was explained in previous sections. It is a nontrivial 
fact from field theory that C, is isomorphic as a field with C. The choice of 
an isomorphism allows us to transfer the absolute value on C, to one on C, 
that still extends the p-adic absolute value on Q. The reader who finds this 
construction very indirect will probably spend some time trying to construct 
directly such an absolute value on C. Inevitable failure will probably convince 
him of the power of the arguments in previous sections. 


Theorem 3.B.25. (Monsky) One cannot dissect a square into an odd number 
of triangles of the same area. 


It is absolutely remarkable that no geometric proof is known for this pretty 
innocent-looking problem. Monsky’s proof (see [53]) is a stunning combination 
of arithmetic and combinatorics. 


Proof. Consider the square with vertices (0,0). (1,0), (0,1),(1,1). Using the 
extension of the 2-adic valuation to R, color the point (x,y) € R? in red if 
max(|zl2, |yl2) < 1, in blue if |z|2 > max(1, |y|2) and in green if |ylo > |z\2 
and |y|2 > 1. We will repeatedly use the easy observation that translation by 
a red point is color-preserving. 

Here is the crucial point: 


Lemma 3.B.26. If T is a triangle whose vertices have three different colors, 
then |A(T)|o > 1, where A(T) is the area of T. 


Proof. By the remark on translations by red points, we may assume that one 
of the vertices of T is (0,0). Let b = (bi, 62) and c = (c,c2) be the other 
vertices, say b is blue and c is green. Then 











b —b b 
|A(T)|p = | Puc2 = B20 =b lob 1-2 E = 
as lbi|2, |c2|2 > 1 and a Bl <L O 








162 Chapter 3. Look at the Exponent 


Consider now a dissection of the square into n triangles of the same area, 
which is necessarily 1/n. Color only the vertices of the triangles, as above. 
If we can prove that there is a triangle with vertices of different colors, we 
deduce from the previous lemma that |n|z2 < 1 and so n is even. The existence 
of such a triangle is a trivial consequence of Sperner’s lemma, but it is perhaps 
useful to recall how things work in this easy two-dimensional case: consider 
segments on the boundary of the square whose endpoints are red and blue 
(i.e. one endpoint is red and the other one blue). All vertices on the edge 
[0,1] x {0} are either red or blue. All vertices on the edge {0} x [0,1] are 
either red or green. All vertices on the edges [0,1] x {1} or {1} x [0,1] are 
either blue or green. Therefore all segments on the sides with one endpoint 
red and the other blue are on the side [0,1] x {0}. As (0,0) is red and (1,0) 
is blue, there must be an odd number of such segments. On the other hand, 
assume that no triangle has vertices of different colors. It is easy to check that 
all triangles have an even number of sides whose endpoints are red and blue. 
As the triangles partition the square, we deduce that the number of red-blue 
segments on the border of the square is even, a contradiction. ‘Thus, there 
must be a “colorful” triangle and the theorem is proved. oO 


3.B.6 Mahler expansions 


One of the miracles of p-adic analysis is that one has a fairly explicit 
description of all continuous functions on Zp. Of course, this is far from 
being true in real or complex analysis, so the following theorem is surprising 
to say the least. It is however absolutely crucial when dealing with more 
delicate aspects of p-adic numbers and we will use it constantly in the following 
sections. 


Theorem 3.B.27. (Mahler) For any continuous function f : Zp + Qp there 
is a unique sequence (an(f))n>0 of p-adic numbers such that limp an = 0 


and for all x € Zp 
fe) =E aA). 


n>0 


Moreover, we have minzez,, Vp(f(£)) = minn>0 vplan(f)). 


3.B. An Introduction to p-adic Numbers 163 


Proof. Note that if the equality 


holds for all x € Zp, then for all n 


f(n) =J al) (o) 


k=0 


Either by considering the exponential generating function of (f(n))n and 
(an(f))n or by using the theory of finite differences (see chapter 10, section 
10.3), we deduce that 


ath) = DD (P) £00 


k=0 


Assume for a moment that we proved that limp—œ an(f) = 0, which is 
the difficult point of the theorem. Then, since (*) € Zp for x € Zp, we deduce 
that g(z) = 3°,554n(f)(") converges uniformly for x € Zp and so g is a 
continuous function. Moreover, by construction g(n) = f(n) for all n > 1, 
so by density of {1,2,...} in Zp we obtain f = g and the first part of the 
theorem follows. Finally, from the previous relations between the values of f 
at positive integers and the a,(f) we obtain 

up(f(n)) = Reem vp(ai(f)), Uplan(f)) = Reem up( f(z), 
so another density argument yields minzez, vp(f(z)) = minn>0 vp(an(f)). 
Note that those minima exist, as up(an(f)) diverges to oo and as f is con- 
tinuous on the compact set Zp. 

It remains to prove that up(an(f)) > oo. As f is bounded (because it is 
continuous and Zp is compact), by multiplying f by some power of p we may 
assume that f(Zp) C Zp. As Zp is compact, f is uniformly continuous on Zp, 
so there is no such that vp( f(x + p”) — f(x)) > 1 for all z € Zp. Let 


Af(x) = f(x + 1) - f(x), 


164 Chapter 3. Look at the Exponent 


then o 
A'fl) =DD (T) fet 
k=0 

and an(f) = A” f(0). As p divides (”,”) for all 1 < k < p™, it follows that 
up(AP™ f(x)) > 1 for all x € Zp and so v,(A"f(z)) > 1 for all n > p™ and 
all z. The map g = 5a f is continuous and g(Zp) C Zp. Applying the 
same argument to g, we find nı such that vp(AP™ g(x)) > 1 for all x. Then 
Up(A” f(x)) > 2 for all n > p”® + p™!. Continuing like this, we find integers n; 
such that v,(A”f(x)) > d for all n > p" +---+p"4—! and all z € Zp. Taking 
xz = 0 shows that vp(an(f)) —> oo and finishes the proof. o 


Remark 3.B.28. The numbers a,,(f) are called the Mahler coefficients of the 
function f. We discussed only the case of Q,-valued functions, but the result 
holds for K-valued functions, where K is any complete subfield of Cp, with 
basically the same proof. 

Here is a nice application of the previous theorem, proposed at the USA 
TST 2011 by Josh Nichols-Barrer. 
Example 3.B.29. Let p be a prime. We say that a sequence of integers {zp}, 
is a p-pod if for each e > 0, there is an N > 0 such that whenever m > N, p° 


divides the sum mn 
S- 1)* (T) Zk. 


k=0 
Prove that if both sequences {£n}? o and {y,}°<, are p-pods, then the se- 
quence {£nYn} Co is a p-pod. 


Proof. See the sequence zn as a map on N in the obvious way. The Mahler 
coefficients of this map are precisely the numbers 


am(z) = S61) ("") Zk- 
k=0 


By hypothesis, z is a p-pod if and only if am(z) tends to 0 in Qp and by 
Mahler’s theorem this happens if and only if z,, extends to a continuous func- 
tion on Zp (namely x > D n>0 an(z): (*) ). But the pointwise product of two 


3.B. An Introduction to p-adic Numbers 165 


continuous functions on Zp is clearly a continuous function, from which the 
result follows. oO 

One also has a characterization of locally analytic functions f : Zp > Qp 
in terms of their Mahler coefficients, though the proof is much more difficult. 
Recall that a function f : Zp — Q, is called locally analytic if for any a € Zp 
there exists ng and p-adic numbers f™ (a) such that whenever vp(£ — a) > na 
we have 





(a 
f(x) = ` J Ne — a)”. 


n 
n>0 


Theorem 3.B.30. (Amice) A continuous function f : Zp —> Qp ts locally 
analytic on Zp if and only if its Mahler coefficients an(f) satisfy’ 


lim suPp 400 {/lan(f)lp < 1. 


3.B.7 The p-adic Gamma function: preliminaries 


Recall that the Gamma function is defined for Re(s) > 0 by 


°° dt 
I'(s) = | et —. 
0 


t 


It has the nice property that it interpolates the numbers n!, as [(n) = (n— 1)! 
for any n. One would like to have a p-adic analogue of the Gamma function, as 
this function plays an amazingly important role in real and complex analysis. 
Unfortunately, it is not difficult to see that there is no continuous function 
f : Zp + Qp such that f(n) = (n — 1)! for all positive integers n. On the 
other hand, we have the following beautiful result, due to Morita. To simplify 
the exposition, we will assume from now on that p > 2. There are natural 
extensions of all the following results to the case when p = 2, but that would 
force us discuss two cases in both the statement and proof of the following 
results. 


3This means that there is r > 0 such that vp(an(f)) > rn for all sufficiently large n. 


166 Chapter 3. Look at the Exponent 


Theorem 3.B.31. (Morita) There is a unique continuous map Ip : Zp —> Qp 
such that for alln > 2 we have 


Proof. Defining 


for n > 2, let us prove the following 


Lemma 3.B.32. g(n + p*) = g(n) (mod pf) for all n and all k > 1. 


We have 
n—-1 n+p*—-1 
g(n)—g(n+p*)=(-1)” [J ag-f1+ J] af, 

j=l j=n 

gcd(j,p)=1 gcd(j,p)=1 

so it is enough to check that 
n+p*—1 
i+ J| i. 

j=n 

gcd(j,p)=1 


But if 7: Z > Z/p*Z is the natural reduction map, it is clear that 


n+p*-1 

r| |] s]=I1o. 
j=n gEG 
gcd(j,p)=1 


where G = (Z / pz)”. The elements g in the previous product come in pairs 
(g,g_ +), but one has to pay attention to the fact that one might have g? = 1. 


3.B. An Introduction to p-adic Numbers 167 


However, as p > 2, this appears precisely when g = 1 or g = —1. Thus, the 
product of all g’s equals —1 and we are done. 

The previous lemma easily implies that vp(g(m) — g(n)) > v(m — n) 
for all distinct positive integers m and n. Choose any p-adic integer a and 
any sequence £n of positive integers such that limp_,.~ Tn = a in Zp. Since 
Up(g(zi)—g(z;)) > vp(zi—Z;), it follows that the sequence (g(£n))n is a Cauchy 
sequence and so it converges to some p-adic integer g(a). If yn is another 
sequence that converges to a, then applying the result we have just obtained 
to the sequence 21, 41,22, y2,---, we deduce that g(yn) converges to g(a), i.e. 
g(a) is independent of the choice of the sequence (£n)n. Thus, we obtain a map 
Tp : Zp > Zp which clearly extends g. Passing to the limit in the inequality 
Up(g(m) — g(n)) > up(m — n), we deduce that vp(Tp(x) —Tp(y)) = vp(x — y) 
for all x,y € Zp, showing that I, is continuous. This proves the existence of 
Tp. The uniqueness part is a trivial consequence of the density of Nin Zp. U 


The following proposition summarizes the basic properties of Tp. We will 
prove much deeper results in later sections, once we have developed enough 
tools. 


Proposition 3.B.33. 1) For all positive integers n we have 


[p(n +1) = (-1)""? 


2) T (Zp) C Z5. 
3) If t(x) = —x for x € Zi and T(x) = —1 for x € pZp, then 
[,(z +1) = 7,(x)T (2). 
4) If z E€ Zp and r(x) € {1,2,...,p} is the unique integer such that 
x — r(x) € pZp, 


then 
Tp(£) -Tp(1 — 2) = (-1)"™ 


168 Chapter 3. Look at the Exponent 


Proof. 1) This follows immediately by definition of the p-adic Gamma func- 
tion. 

2) By construction, vp(['p(n)) = 0 for integers n > 2. As these integers 
form a dense subset of Zp and as vp o Tp is continuous, 2) follows. 

3) This follows immediately from the definition if x is a positive integer. 
The general case follows by density and continuity. 

4) By density and continuity, it suffices to prove that 


[,(—n)P,(n + 1) = (-1)?t he 
for positive integers n. But multiplying the relations 
r,(1 —j)= Tp(—J)T p(—J) 
from 3) yields 


n. 


1 
T(n) 7 Ir 
=[T(--) I] 3 


pli ged(p.j)=1 
= (—1)P/P (1) HT, (n + 1) 





and the result follows. o 


3.B.8 Mahler expansions and discrete antiderivatives 


Exploiting Mahler’s theorem, we develop basic p-adic calculus in this sec- 
tion. This will then be applied to the p-adic Gamma function and then to 
establish some fairly deep congruences. We start by proving that any contin- 
uous p-adic function has a continuous (discrete) antiderivative. 


Theorem 3.B.34. Let f : Zp —> C, be a continuous function. There exists a 
unique continuous function Sf : Zp > Cp such that Sf(0) =0 and 


Sf (x +1) — Sf(a) = f(x) 


for all x € Zp. Moreover an(Sf) = an-ı(f) forn > 1. where an(g) are 
Mahler’s coefficients of g. 


3.B. An Introduction to p-adic Numbers 169 


Proof. This is very easy using Mahler’s theorem 3.B.27. Namely, look for 


S f(x) = do bn (*) 


n>0 


and observe that 


Si(v +1) ~ $f) = Dds (E)E tral) 


n>0 


Taking into account the condition that Sf(0) = 0, we have bọ = 0. Also, 
Sf satisfies the desired equation if and only if bn+1 = an(f) for all n > 0. 
Thus any solution must satisfy an(S f) = an-1ı(f) for n > 1 and aọ(Sf) = 0, 
yielding uniqueness. But this also gives the existence, since if an( f) — 0, then 
an-ı(f) + 0 and so Sf(x) = >, 59 bn (7) is a continuous function if f is. O 


Next, we define the notion of integrable and C! class functions. Unfor- 
tunately, there isn’t really a very good notion of p-adic integration and each 
such construction has its limitations. For instance, the one we have chosen in 
this book has the drawback that not all continuous functions are integrable. 
But since we will deal only with sufficiently nice functions, this will be enough 
for our purposes. 


Definition 3.B.35. 1) A function f : Zp — Cp is called Volkenborn inte- 
grable (or simply integrable) if 


ny 
IS a, 
[ tear = ia 2 F0) 
P j=0 
exists in Cp. 
2) Say a continuous function f : Zp > Cp is of class C! if 


lim n: lan(f)lp = 0. 


n—> oo 


170 Chapter 3. Look at the Exponent 


These definitions might seem a bit strange at first. However, the average 
whose limit defines the integral of f should be seen as a Riemann sum. So 
Volkenborn integrable functions are precisely those for which Riemann sums 
converge. The definition of C! class functions is a bit more subtle, but one can 
prove (although this is nontrivial) that it agrees with the intuitive definition. 
We will prove part of this assertion when proving the next theorem. The 
following remark will play a very important role in the proof. 


Remark 3.B.36. If z Æ y € Zp, then 


ra (G) maa 


Indeed, Vandermonde’s identity (proposition 3.B.21) yields 


eo ROS 














| 
Mi 
BP 
J 
a> 
S 
TTN 
= X 
| | 
Q. © 
| | 
— = 
NLA 


i 


which immediately yields the result (using the fact that (z ) € Zp if £ € Zp). 
Note that 


Jlem(1,2,...,7)|p = p ll°8p”] > 


so we obtain the very useful estimate 


x y 
n n 
The following result is fairly technical, so the reader might want to skip 
the proof at a first reading. 
Theorem 3.B.37. A function f of class C! is integrable and 


y EL 
J, te — anf). 


< nlx — ylp. 
p 








Moreover, in this case we also have: 


3.B. An Introduction to p-adic Numbers 171 


1) f is continuously differentiable, i.e. f'(x) = limy +2 fu) Ke) exists in 


Cp for all x and x > f'(x) is continuous. 


2) We have 








1 > 1 
— > fj) = = Sfp") 
pr p 
1 n 
= -n ak-ı(f) (E) 
P kel 
_ 3 ak-ı(f) (pP — 1 
k k-1/)? 
k=1 
yields the estimate 
LS yp an] < 
— — k < SUP Ink, 
p" j=0 sso Etl k21 i 
-= p 
“er lax—i(F)lp | (p" — 1 
Qk-1 p|{P — k-1 
me = ei -(-1 
vk IK lp (aa) ) p 








Thus, to establish the first formula of the theorem, it is enough to check that 
Sup, Zn,k tends to 0 as n — oo. Clearly, K; < k, s0 Tnk < kļlak-1ı(f)l|p- 


Moreover, since (—1)*-! = (Z), remark 3.B.36 yields £n k < lan—1(f) |p Sr 
Putting everything together, we deduce that 


k 
Ink < k|ak-1ı(f)|p : min (1 =] . 


Combined with kla,(f)|p — 0, this easily proves that sup; £nk — 0, estab- 
lishing the first formula. 


172 Chapter 3. Look at the Exponent 


1) By remark 3.B.36, there are polynomials P, € Q[X, Y] such that 
(i) =O) = pa, 
n (u,x) 
for all u, x and |Ph(u, £)|p < n. Since n|an(f)|p —> 0, it follows that the series 
f(u) — f(x 
LOID L an F)Palt 
n>0 


converges uniformly and its limit as u —> x is )),59@n(f)Pr(z,2), which is 
continuous (as the series converges uniformly). p 

2) First, we compute the Mahler coefficients of g,(u) = f(x + u) — f(u) 
by using Vandermonde’s identity 3.B.21: 


salu) = Dranl(" 2") EOG) 


n>0 n>0 
- 3 al) Y @ C ;) 
=Z (i) 


where 


eas) = EaD, E) 


Thus klag(gz)|p < SUPn>ək nlan(f)lp, SO gz is of class C! and we can express 


G(x) = J (f(e +u) — f(u))du 


pP 


in terms of Mahler coefficients of g,, using the first formula of the theorem: 


G(x) = > > ant) (2) 





_— 


3.B. An Introduction to p-adic Numbers 173 


It is apparent from this formula that G is continuous. Finally, if x > 1 is an 
integer, we can write by definition 


pr-1 


G(e) = tim = E Fe +9) — SO) 
pr 
lim — F +3) — £9) 


n—oo p” 


j=0 
xr—l 
X F'G) = Sf'(2). 


j=0 


The conclusion follows now by continuity of G and Sf’ together with the 
density of the set of positive integers in Zp. CJ 


3.B.9 Application to the p-adic I'-function 


We are now ready to prove the following deep theorem, that will allow us 
to prove very difficult congruences: 


Theorem 3.B.38. log, ol’, is an analytic function on pZy and has a power- 
series expansion 


Àn 2n+1 
log, orp (£) = Agx — ) ent" 
Pp P z 2n(2n + 1) 


where 


p*-1 


1 
Ao =) log,(u)du, A, =J c "dr = lim — i7?" 
Z: p i JZ k- oo pk 32 
gcd(i,p)=1 


Proof. In this proof we will use the simpler notation |x| for |x|p. Using propo- 
sition 3.B.33, we deduce that if f(x) = log, (T'p(z)), then 


f(e +1) ~ f(t) = lej=1 log, (2). 


174 Chapter 3. Look at the Exponent 


This functional equation combined with the fact that f(0) = 0 (this follows 
from proposition 3.B.33) shows that f = SA, where A(z) = 1,)—) log,(z). 
But it is easy to find an antiderivative of A: reasoning from the original power 
series, one chooses B(x) = 1),)=1(x - log,(z) — x) and checks that B’ = A. 
Note that thanks to the hypothesis x € pZp, we have |u|, = 1 if and only if 
ju + z|p = 1, for all u € Zp. Therefore, theorems 3.B.30 and 3.B.37 yield the 
integral representation of f: 


log,(I'p(z)) = [ liuj=1((£ + u) log,(@ + u) — (x + u) — ulog, u + u)du. 


Using the fact that log,(x + u) = log, u + log (1 + z/u) and expanding 


n 





(—1)""! T 

l 1 = par 

og,(1 + 2/u) » + 

for |u| = 1 yields 
(x + u) log, (x + u) — x — ulog,u 
(—1)""! ah 
= (x + u) log, u + st Ta — z — u log, u 
n>1 
oO”. r” 
n>1 n>2 


and an easy calculation (based on the change of variable n — 1 = j in the last 
sum) yields 


Luj=1((z +u) log,(z+u) — (x+u)—ulog,ut+u) =T: liuj=1 logpu+ fn z(u) 
n>1 
where (1) gnt 
—1)\r lr” 
fat) = Tine) 


Of course, we integrate this, but the difficult point is to check that the integral 
commutes with the infinite sum. Again using theorems 3.B.30 and 3.B.37, we 


luj” 


3.B. An Introduction to p-adic Numbers 175 











can write 
_ (—1)* 
x J. fattu- DD ep 4a 
_1)* 
=) = arl fna) 
k>0 n>1 
(—1)* 
= ` +1 Qk (£ fn .) 
k>0 n 


all manipulations being easily justified by the fact that fn z takes small values 
(and so its Mahler coefficients are small). We finally deduce that 


n>1 
It remains to simplify this a little bit thanks to the following 
Lemma 3.B.39. If g is of class C! and odd, then 
/ 
0 
[ g(u)du = _ (0) 
Zp 2 


Proof. We leave this as an easy exercise to the reader. We give a hint, however: 
write 


g(p” — j) = —g (j — p”) = —(9(9) — vg (i) + O(p")) 
and observe that > a! g'(j) = (Sg') (pf). oO 


Applying this to the functions fns with n odd, we obtain that 


fnz(u)du = 0 


Zp 


for n odd. Putting everything together, the result follows. oO 


176 Chapter 3. Look at the Exponent 


We would like to have more information about the numbers An that appear 
in the previous theorem. A first step in doing this is the following lemma: 


Lemma 3.B.40. We have 
An — (1 — pr | x" dz € Zp. 
Zp 


Proof. Using the fact that g > g`! 


(Z/ pZ)", we deduce that 


is a permutation of the finite group 


p*—1 p*—1 p*—1 pe-1-] 
5 ¡72 = ` {2r = ` jen — pn 5S jen (mod p° Zp). 
1=1 i=1 i=0 i=0 
gcd(i,p)=1 gcd(i,p)=1 
Dividing by p* and letting k — œœ, the result follows. Ll 
Next, let 


B, = | xv" dx. 
Zp 


By linearity of the Volkenborn integral we can write for n > 0 


See f ECA) 


k=0 k=0 


— J ((z +1) — z”+!)dr 
Zp 


k1 
1 Pp 
— jj a . 1 n+l -n+l 
jim = 2 (G + 1yt — 5") 
j=0 
= lim p™ 
k-00 
= 0, 


which immediately yields the exponential generating function of the sequence 


(Bn)n: B X 
>, Te ~ eX _ 1 


n>0 





3.B. An Introduction to p-adic Numbers 177 


This allows us to compute Bọ = 1, By = -4 By = l and to deduce that all 
Bn are rational numbers. Actually, these numbers B, are called Bernoulli 
numbers and appear all over the place in mathematics. Their rather subtle 
arithmetic properties will help us prove the following important 


Theorem 3.B.41. One has pàn € Zp for all n > 1. Moreover, if p > 3, then 
Ai € Lp. 


Proof. Using the previous lemma and discussion, it suffices to prove that 
pBn € Zp for all n (the second part follows from the observation that Bz = ¢ 
is prime to p if p > 3, so we will not say more about it). We will prove this 
by induction on n. For n = 1, it is clear, so assume that it holds for 7 < n. 


Recall that 
X Bn on 
oX _] ar n! P 


n>0 





Let k be a positive integer. Multiplying this by e** yields the equality 


Gaes -52 5 (") Bier 





n>0 n! 1=0 
On the other hand, 
ekX X X X k-1)X 
x oxy TX te +--+ + eT) ). 


Identifying the coefficients of X"*! in this equality shows that for all n > 1 





1 -l , 


Take k = p and note that pi i (+!) B; = (pB) (") 2 2 — € Zp for all 0 < i < 


? 
n — 1. Combining this observation and the previous equality, we deduce that 


pBn E Zp, finishing the proof of the theorem. Ll 


178 Chapter 3. Look at the Exponent 


3.B.10 Some deep congruences 


p-adic numbers play a crucial role in establishing difficult congruences, 
which are almost impossible to get by other means. The following theorem 
is a very famous such example. We present here Robert’s and Zuber’s nice 
proof, following [68]. 


Theorem 3.B.42. (Kazandzidis) If p > 3, then for all positive integers n, k 


we have 
(2) =(2) ote 


where j = 3 + up(nk(n — k)) + vp((Z))- 


Proof. Proposition 3.B.33 shows that for all positive integers x we have 


Py(pe) = (=P. Pe 





Therefore, if we denote x = kp and y = (n — k)p, we have 


(ep) _ [p(x +y) 
(o) Pp(z)V ply) 
The definition of log,(x) as a power series shows that |x — 1|p = | log,(x)|p for 


all x € 1 + pZp. So, if we let f(x) = log,(Tp(x)), the congruence we have to 
prove is equivalent to 





up( f(x +y) — f(z) — Fly)) = vp(zy(z + y)). 
But theorem 3.B.38 yields 


up( f(x +y) — f(x) — fly) 


— An 2n+1 2n+1 2n+1 
= Up 2o many et x yor] 


> i 
2 int Up(Gn), 


3.B. An Introduction to p-adic Numbers 179 


where 


— Àn 2n+1 2n+1 2n+1 
n = Fn 4 pl +y) T yT]. 


It is thus enough to prove that vplan) > vp(zy(x+y)) for alln > 1. Forn = 1, 
this follows from A, € Zp, which was proved in theorem 3.B.41. So, suppose 
that n > 2. Since (X + 1)?"+! — X2"*! — 1 vanishes at 0 and —1, there is 
f € Z[X] of degree 2n — 2 such that 


(1 + x)2n71 = Kent —1= X(1 + X)f(X). 


Since y?"~* f (x/y) is a homogeneous polynomial of degree 2n — 2 with integer 
coefficients in z, y E€ pZp, we deduce that 


up((x + y)2rt} — 22ntl — ntl) = y (xy(2 + y)) + vp (es (2)) 
> up(ry(z + y)) + 2n — 2. 
It is thus enough to prove that 
Up(An) + 2n — 2 > up(2n(2n + 1)), 
which follows easily from up(An) > —1 (theorem 3.B.41) and the inequalities 
Up(2n(2n + 1)) < max(vp(2n), vp(2n + 1)) < logs(2n + 1) < 2n — 3, 
the last one being true for n > 2. E 


Similar techniques apply to prove the following nice congruence, taken 
from [17]. 


Theorem 3.B.43. If p> 5 andn > 1, then 





np)! n 
te): =((p—1)!)” (mod p3**(n"-7)) 


and this does not hold modulo pitree(n*—n) unless p is a Wolstenlholme prime, 


i.e. (PY) = 1 (mod p’). 


180 Chapter 3. Look at the Exponent 


There are similar sharp congruences for p = 2,3,5, for which we refer the 
reader to [17]. 


Proof. (sketch) As in the proof of the previous theorem, one starts by noting 
that 


Fo(ne) = (DPE, Ta) =~ 1) 





so the congruence reduces to 
Up (Tp(np) — Tp(p)") > 3 + vpn? — n). 


If f(x) = log, (Tp(z)), the equality vp(y—1) = vp(log, y) (true for y € 1+pZp) 
reduces the problem to showing that 


Up( f (np) —nf(p)) > 3+ vp(n® —n). 


One uses again theorem 3.B.38, but one also has to analyze the term containing 
Ag. As the details are routine, they are left to the reader. LJ 


Here is yet another similar congruence, with the same idea. 


Theorem 3.B.44. (F.Rodrigo- Villegas) Let (an)n be a sequence of integers, 
finitely many of which are nonzero and such that 2 n>1 Nan = 0. Define 


A(n) = ] [ (en)! 


k>1 


Then for any prime p > 3 and any s > 1 we have 





A(p*) 3 
=1 (mod p’’). 
A(p*~*) 
Proof. (sketch) The point is to study log, ear j). by expressing it in terms 


of values of the function z > log, p(x). For instance, if s = 1, then this equals 
> kak l0g8p[p(kp). One finishes as usual by using the analytic expansion of 


f(z). C 


3.B. An Introduction to p-adic Numbers 181 


Theorem 3.B.45. Let n be a multiple of an odd prime p. Then we have an 
equality in Qp: 


n—-1 

>, k >, G? 

k=1 J21 
ged(k,p)=1 


where A; are as in theorem 3.B.38. 


Proof. (sketch) Note that f(z) = log,(Tp(x)) is locally analytic on Zp, as 
follows from theorem 3.B.38 and from the functional equation 
f(x +1) — f(x) = 1)p),=1 log, T- 
It follows easily that f is differentiable and that g = f’ satisfies 
1 
g(z+1)—g(x) = Nolp=1 >: 


Thus 


k=1,gcd(k,p)=1 
Next, by differentiating the Taylor expansion of f in theorem 3.B.38, we obtain 
that for x E€ pZp 


The result follows by combining these two identities (with x = n in the second 
one) and by observing that g(1) = g(0) = Ao. C 


3.B.11 More on Bernoulli numbers 


In this section we focus on some classical congruences satisfied by 
Bernoulli numbers. Recall from section 3.B.9 that they are defined by 
B, = Jz, r”dz. We saw that (Bn)n is a sequence of rational numbers, with 
exponential generating function 


xX” X 
> Bay ~ eX _ J’ 


n>0 


182 Chapter 3. Look at the Exponent 


Lemma 3.B.39 shows that B, = 0 for all odd numbers n > 1 (this can also 
be deduced from the generating function, by checking that the function x > 
-= + 5 is even). We saw during the proof of theorem 3.B.41 that Bernoulli 
numbers satisfy the crucial identity 


1 n 1 | 
n = 1" +2"4... k-1)"= ; nt+1—-t 
S (k) =I" +2 +--+ (k-—1) -) (" | ) Ba 


; ( 
1=0 





This identity will play a crucial role in the proof of the following beautiful 
result, which considerably refines theorem 3.B.41. 


Theorem 3.B.46. (von Staudt-Clausen theorem) For all n > 1 we have 
1 
Bnt+ X = eZ. 
p—1|2n 
Proof. Let p be any prime. We need the following elementary result: 


Lemma 3.B.47. Ifn is even, then for allk > 1 we have S;(p*t!) = p-Sp(p*) 
(mod p*t?),. 


Proof. This is a simple application of the binomial formula: 


Sn(p**") = So SoG +l - pt)” 
j =0 


Note that if p is odd, then the hypothesis that n is even is useless. CI 


Next, we claim that Senlp) — Bən € Zp for all primes p and all n. Note 


k+1 k 
that the previous lemma implies that Zap — Senp € Zp for all k > 1, 


3.B. An Introduction to p-adic Numbers 183 


k 
hence it is enough to show that we can find k such that Senp") — Bən € Zp- 


But 
San (p€) 1 22 /on+1 | 
— Bən ) ( . ) Bpr 


k = 
D 2n+1 <= J 








is certainly in Zp for k large enough, as each of the terms in the sum is in Zp. 
This proves the claim. 

Finally, a standard argument? shows that S„(p) = 0 (mod p) unless p— 1 
divides n, in which case S,(p) = —1 (mod p). Combining this with the result 
of the previous paragraph, we deduce that Bo, € Zp unless p — 1|2n, in which 
case By, + 5 € Zp. All in all, this shows that the rational number Bon + 


> og-—1|2n : belongs to Zp for all primes p and so it is an integer. The result 


follows. L 


The following classical result is considerably more difficult to prove. The 
proof is actually rather mysterious, though elementary. It is highly related to 
the existence of p-adic analogues of the Riemann zeta function, though this 
may not be apparent at all... 


Theorem 3.B.48. (Kummer’s congruences) Let m,n be positive integers and 
let p be a prime such that p — 1 does not divide n, but p — 1 divides m — n. 
Then Bm — Bn E€ pLZy. 


Bk 


Bk+p- 
Proof. It suffices to prove that F# = p3 


k+p-1 
divide k > 1. Note that it is not even clear that Ër € Zp, but this will be the 
case (of course, this is clear if k is odd, as then B, = 1). Let us fix an integer 
l <a < p-— 1 such that a is a primitive root modulo p. The following lemma 
is crucial. 





(mod p) whenever p — 1 does not 


1Let g be a primitive root mod p. If p— 1 does not divide n, then g” # 1 and 


p—2 E n S 
gr=-1 


184 Chapter 3. Look at the Exponent 


Lemma 3.B.49. There exist bn € Zp such that (as formal series) 


a 1 x 
ax x = DL le T 


n>0 





Proof. By changing the variable Y = e* — 1, this is equivalent to 


a 1 
— —- — = b,Y”. 
(1+Y)?-1 Y > 


n>0 


The binomial formula and the fact that gcd(a, p) = 1 yield the existence of 
g € YZ |Y] such that (1 + Y)° — 1 =aY(1 +g(Y)). We then have 


a 1 1 1 1 
(1+Y)?-1 Y Y (r ) 2 ) y’ ) 
The result follows, as g € Y Zp[Y]. O 


Let use denote U„ = (a” — 1) - 2 and observe that by definition of (Bn)n 


we have 
a 


ae - wo 12 = Una 


n>0 n>0 


To fully exploit lemma 3.B.49, let us denote 


j=0 
so that 


Xk 
— 1)” — D Sk TT 


k>0 
Replacing these relations in lemma 3.B.49 and identifying coefficients yields’ 
Un+1 = =X bj - Sjn 
j>0 


“Note that there are no convergence issues, as Sn, = 0 for n > k. 


3.B. An Introduction to p-adic Numbers 185 


for all n > 0. 

It is now easy to conclude. Recall that p— 1 does not divide k, so p does 
not divide a* — 1. The previous relation and the fact that bj, Sjn € Zp for 
all j, n show that Uk E€ Zp and so Br € Zp. The same argument shows that 


petel € Zp. Also, Fermat’s little theorem yields sjn = Sjn+p-1 (mod p) for 
all j > 0 and all n > 1. Hence Un = Un+p-1 (mod p) for all n > 1. As 


akt+p-1 = gk (mod p), this congruence is equivalent to 


Bray 
(a — 2 = (a* — 1)— Etel l 


kp] (mod p). 


The desired result follows from this congruence and the fact that p does not 
divide a* — 1. O 


Remark 3.B.50. Here is the real meaning of this proof. As we have already 
said, the crucial fact is that fa(X) = yxy — x € Z,|[X]], say fa(X) = 
dun>0 bh X”. This allows us to define a measure ua on Zp (i.e. continuous 
linear form on the space of continuous functions g : Zp —> Qp) by 


J Jla = X an(g)bn 
Zp n>0 


for all continuous functions g : Zp > Qp. Here an(g) is the nth Mahler coeffi- 
cient of g and the series converges by Mahler’s theorem (as limn—=œ an(g) = 0 
and bn € Zp). Since bn € Zp for all n, Mahler’s theorem shows that 


up ( I ama) > min p(9(c)) 


for all continuous maps g. 
Note that snk = @n(zx*), so the equality Un+1 = 2 j>0 0783,n established 
during the proof can be written as 


Un+1 -| T” Lg. 
Z 


p 


186 Chapter 3. Look at the Exponent 


Since v,(x” — x"tP~!) > 1 for all z € Zp and n > 1, the previous paragraph 
shows that Un = Un+p-1 (mod p) for all n > 1. 

This new interpretation of the proof of Kummer’s congruence yields other 
nontrivial congruences. Assume for instance that m =n (mod p’ (p—1)) and 
m,n > N. By Euler’s theorem we have vp(z™ — 2") > N +1 for all z € Zp 
and we deduce that Um = Un (mod p%t!). This observation is the beginning 
of the construction of the p-adic zeta function of Kubota and Leopoldt. 

We end this addendum with an application of the previous results to 
harmonic numbers. It is standard that for any prime p > 3 we have 


—] p—l1 


lo 2 l 
g5? (mod p*) and D5 =° (mod p). 


a) 


o> 


The following result considerably refines these congruences. 
Theorem 3.B.51. For all primes p > 3 we have 


ly p 
k? 3 
1 


1 


3 

| 
i) 

x 


By-3 (mod p*) and 


-2 Bp-3 (mod p°). 


> 
I 
> 
Il 
— 
oy | — 


Proof. Euler’s theorem yields 
p—l 1 


p-1 
2)_ 
De = kP) * (mod p°). 
k=1 k=1 


We will use the following general result: 


Lemma 3.B.52. If p> 3 andn > 1, then 


np? Bn- 1 


7 (mod p°). 


1” +2” +...+(p— 1)" =pB,+ 


Proof. The left-hand side is equal to 


1 “n+ np Bn- 5 p“ 
— = pB, + 22m1 pe. 
wi ( k ) Basi- kp" = pBn + 5 ee 1) Bos kT 


3.B. An Introduction to p-adic Numbers 187 


It is thus enough to prove that (,.",) Butt ne E€ p Zp forall 3 <k<nH+1. 
As pBn+1-k € Zp (by the von Staudt-Clausen theorem) and P n 1) € Zp, it is 


enough to check that = — e Zp. This is easy and left to the reader (here one 
crucially uses the hypothesis p > 3). C 


Applying the previous lemma to n = (p?) — 2 and noting that Bn-1 = 0 
(as n is even), we obtain 


pee )-2 = Bop2)-2 (mod p°). 


It remains to use Kummer’s congruence to obtain By/p2)_2 = 2 p-3 (mod p) 


and finally 
p—-1 


_ 2p 


One could apply a "Similar method for the second congruence, but we 
prefer to deduce it from the first one. Note that 


P- /1 1 P 
S= (i +=) Pew ai 


Using this identity and the congruence we have just proved, it is enough to 
prove that 


po 1 1 
> Ce k) t z) =0 (mod p’) 


°% pgp’ (mod p) 


p—1 1 
eX p= 0 (mod p). 
k=1 
The last congruence is also standard and it is proved using primitive roots 
mod p. L] 


Chapter 4 


Primes and Squares 


This chapter is concerned with arithmetic properties of primes of the form 
4k+1. It is very elementary and most problems use the following two results. 


Theorem 4.1. (Fermat) Any prime number of the form 4k+1 can be written 
as the sum of squares of two integers. 


There are many proofs of this classical result, see for instance the first 
example in chapter 4 of [3]. For a different proof, using properties of Gauss 
and Jacobi sums, see the addendum 9.A. 


Proposition 4.2. Let p be a prime of the form 4k + 3 and let a and b be 
integers such that p divides a? + b?. Then p divides a and b. 


Proof. If p does not divide a, there exists c such that ca = 1 (mod p). Then 
(bc)? = —1 (mod p), thus 


(-1)° = (bc)?! =1 (mod p), 


the last congruence being Fermat’s little theorem. But this is clearly impossi- 
p—1 
ble, since by hypothesis (—1) 7 = —1. The result follows. O 


An easy consequence of proposition 4.2 is that vp(a? + b?) is even for any 
prime p of the form 4k + 3 and for any integers a,b. This is a very useful tool 
when studying some diophantine equations, as the following problems show. 


190 Chapter 4. Primes and Squares 


1. Prove that the number 4mn — m — n cannot be a perfect square if m 
and n are positive integers. 


Fermat 


Proof. If 4mn — m — n = x? for some integer x, then 
(4m — 1)(4n — 1) = (2x)? + 1. 


But there exists a prime p = 3 (mod 4) such that p|4m — 1. Then p divides 
(2x)? + 1, contradicting proposition 4.2. o 


2. Prove that the equation y? = x° — 4 has no integer solutions. 


Balkan Olympiad 1998 


Proof. Write the equation as z° + 2° = y? + 6%. We claim that there is always 
a prime p = 3 (mod 4) such that vp(x? + 2°) is odd. Since for any such prime 
p we have v,(y* +67) = 0 (mod 2), we will thus get a contradiction. Note that 
z is odd, otherwise x = 271, y = 2y; and 8 divides y? +1, which is impossible. 
If z = 1 (mod 4), there is a prime p = 3 (mod 4) such that v(x + 2) = 1 
(mod 2). One cannot have p|x* — 2x3 + 4x7 — 8r + 16 (otherwise p would divide 
5. 24), so up(x® + 2°) = vp(z + 2) = 1 (mod 2) and we are done. If x = —1 
(mod 4), then zt — 2z +...+16 = 3 (mod 4) and we can repeat the argument 
by taking a prime p = 3 (mod 4) such that vp(xt— 2x3 +- --+16) = 1 (mod 2). 
The claim being proved, the result follows. CI 
Proof. We can work modulo 11: since x}! = x (mod 11) for any z, we deduce 
that rë = 0,1,—1 (mod 11) for any z. On the other hand, the quadratic 
residues mod 11 are trivial to find: those are 0, 1, 4,9,5,3. One readily checks 
that the equation has no solution mod 11 using these observations. So, it does 
not have integral solutions either. CI 


3. Solve in integers the equation z? = y! + 7. 


Titu Andreescu, USA TST 2008 


191 


Proof. We clearly have no solutions for y < —1, so let us suppose that y+2 > 0. 
Taking the equation modulo 4, we easily obtain that y = 1 (mod 4). The key 
point is to rewrite the equation as z? + 11? = y” + 2° or, by factoring the 
right-hand side, as 


z? +11? = (y + 2)(y® — 2y” + 4y* — 8y? + 16y? — 32y + 64). 


Since y = 1 (mod 4), we have y + 2 = 3 (mod 4), thus there exists a prime q 
such that vg(y + 2) is odd. Note that q does not divide 


y® — 245 + 4y* — 8y? + 16y? — 32y + 64, 


as otherwise q would divide 7-64 and z?+11?, a contradiction. Thus vg(y’ +2") 
is odd, which is impossible, as it equals v(x? + 117) and q = 3 (mod 4). The 
result follows. L 


4. Find all pairs (m,n) of positive integers such that 


m? —1|3™ + (n! — 2)”. 
Gabriel Dospinescu 


Proof. First, assume that n > 2. Then m cannot be odd, since otherwise 
8 would divide m? — 1, but 3™ + (n! — 2)™ is odd. So m is even. But then 
m?—1 = —1 (mod 4), so there exists a prime p = —1 (mod 4) dividing m?—1. 
But then p divides 3” + (n! — 2)” and since m is even, this implies that p 
divides 3 and n! — 2. Thus 3 divides n! — 2, a contradiction. 

Thus, we must have n = 1 or n = 2. If n = 1 then either m? — 1|3™ + 1 
and m is even or m? — 1|3™ — 1 and m is odd. In the first case we can use 
the same argument as before to get a contradiction (choose a prime p = —1 
(mod 4) dividing m? — 1), while the second case is impossible since 3” — 1 is 
not a multiple of 8 when m is odd. Thus n = 2 and m? — 1|3™. Thus there 
is k < m such that (m — 1)(m + 1) = 3*. But then m — 1,m + 1 are powers 
of 3 which differ by 2. Thus clearly m — 1 = 1 and m = 2. We deduce that 
m = n = 2 is the only solution of the problem. C 


192 Chapter 4. Primes and Squares 


rT? +y? 
T— Yy 





5. Find all pairs (x,y) of positive integers such that the number is 


a divisor of 1995. 
Bulgaria 1995 


Proof. Note that 1995 = 3-5-7-19. The key observation is that 3,7, 19 are all 
primes of the form 4k + 3. If p is any of these primes and if p divides ae 
then p divides z? + y? and so it divides x,y. Writing £z = pr},y = py, we 
obtain 

z’ +y rity 

cy o t yı 
Doing this for every prime factor p of zti and noting that x? +y? = z — y 
has no solutions with z,y > 1, we just have to solve the equation a = 0. 
This is easy, since we can write it as (2x — 5)? + (2y + 5)? = 50 and since 
50 can only be written as sum of two squares in two ways 1+ 49 = 25 + 25. 
Thus 2y+5 € {—7, —5, —1, 1,5,7} and since y is positive we must have y = 1. 
But then z = 2 or x = 3. Putting everything together, we deduce that the 
solutions are all pairs (2k, k), (3k, k) with k € {1, 3,7, 19, 21, 57, 133,399}. O 


6. Find all n-tuples (a1, a2,...,@n) of positive integers such that 
(ai! — 1)(aq! — 1)... (a,! —1) — 16 


is a perfect square. 


Gabriel Dospinescu 
Proof. Suppose that 
(ai! — 1)(ag!— 1)... (an! —1) — 16 = k’. 


First, we claim that a; € {2,3} for all i. It is clear that a; Æ 1 for all 7, so 
assume that a; > 3 for some i. Then a,! — 1 = 3 (mod 4), thus there is a 
prime p = 3 (mod 4) such that p divides a;! — 1. Then p divides k? + 4’, a 
contradiction. Say m among the numbers a; are equal to 3 and the remaining 


193 


ones are equal to 2. The equation becomes 5” — 16 = k?. As k is odd, a 
consideration mod 8 shows that m is even. But then (5/2—k)(5™/2+k) = 16, 


which easily implies that m = 2. Thus the sequence (a1, @2,...,@n) consists 
of two numbers equal to 3 and the remaining equal to 2. Clearly, all such 
sequences are solutions of the problem. LJ 


7. Prove that there are infinitely many pairs of consecutive numbers, no 
two of which have any prime factor of the form 4k + 3. 


Proof. Since no number of the form n? + 1 has a prime factor of the form 
4k +3, it is clear that the pairs ((n? + 1)?, (n? +1)? + 1) yield solutions of the 
problem. LJ 


The following problem is trickier and its solution uses the theory of Pell 
equations. 


8. Let p be an odd prime. Prove that p = 1 (mod 4) if and only if there 
are integers x,y such that z? — py? = —1. 


Proof. One direction follows directly from proposition 4.2, so let us assume 
that p = 1 (mod 4). The key point is to consider the positive Pell equation 
r? — py* = 1. By the general theory of Pell equations, this has a smallest 
nontrivial solution (Zo, yo) (so yo > O is minimal among all solutions with 
y # 0). Note that xo is odd, since otherwise 4 would divide ye +1. If 
to = 2a + 1, we deduce that yo is even and a? +a = pb? for b = ”. If 
p divides a, then a = pc? and a+ 1 = d? for some integers c,d (because a 
and a+ 1 are relatively prime), so that d? — pc? = 1. But obviously € < yo, 
contradicting the minimality of (xo, yo). Thus p divides a+1 and we can write 
a=c’,a+1= pd? for some integers c,d. But then c? — pd? = —1 and we are 
done. LJ 


We continue with another beauty, which became classical in mathematical 
contests. It has the nice feature of having a completely elementary solution 
that uses quite a lot of different ideas. 


194 Chapter 4. Primes and Squares 


9. Find all positive integers n such that the number 2” — 1 has a multiple 
of the form m? + 9. 


IMO 1999 Shortlist 


Proof. Assume that 2” — 1 divides m? + 9 for some integer m and that n > 2. 
Then the only prime factor of 2” — 1 which is 3 modulo 4 is 3. Indeed, if p 
is such a prime, then p divides m? + 3? and so p divides 3. Now, if n has 
a nontrivial odd divisor d, then 2% — 1 = —1 (mod 4) and 3 does not divide 
24 — 1. Thus 2% — 1 has a prime factor p = 3 (mod 4) different from 3. Since 
24 — 1 divides 2” — 1, Choosing m = 3a, it is enough to find a such that 
(2? + 1)(24 + 1)...(22°7 +1) divides a? +1. this is impossible, by the previous 
remark. Thus, n must be a power of 2. Conversely, assume that n = 2* and 
observe that 
2” —1=3-(27+1)(244+1)--- (27 +1). 


gk-1 


Choosing m = 3a, it is enough to find a such that (27+1)(24+1)---(2?” +1) 
divides a?+1. The crucial point is that the Fermat numbers 2” +1 are pairwise 
relatively prime, so by the Chinese Remainder Theorem it is sufficient to prove 
that for any 7 there is a such that a? +1=0 (mod 2” +1). But this is clear, 
since a = 22°’ works. Thus, the answer to the problem is: all powers of 2. O 


The next problems are concerned with properties of sums of two squares. 
A crucial fact is that the set of numbers which can be written as a sum of two 
squares of integers is closed under multiplication, by Lagrange’s formula 


(a? + b*)(c? +d”) = (ad + bc)? + (ac — bd)’. 


Actually, combining theorem 4.1 and proposition 4.2, it is easy to completely 
characterize the elements of this set: they are precisely the nonnegative inte- 
gers n such that vp(n) is even for all primes p = 3 (mod 4). 


10. Prove that a positive integer can be written as the sum of two perfect 
squares if and only if it can be written as the sum of the squares of two 
rational numbers. 


Fermat 


195 


Proof. One direction being clear, let us prove that if n is the sum of the squares 
of two rational numbers, then it is also the sum of the squares of two integers. 
Thus, we know that na? = b? + c? for some integers a,b,c with a £ 0. For any 
prime p, we deduce that 


Up(n) + 2u,(a) = Up(b" + c°), 


so that vp(n) has the same parity as vp(b? +c”). Since for any prime p = 3 
(mod 4) we have v,(b? + c°) = 0 (mod 2), it follows that for any such prime 
p we have v,(n) = 0 (mod 2). The result now follows from the preliminary 
discussion. go 


Remark 4.3. Actually, the following result holds: if n > 2 is an integer and if 
an integer x can be written as a sum of n squares of rational numbers, then it 
can also be written as a sum of n squares of integers. For n = 3, this follows 
from Davenport-Cassels’ lemma ([3], chapter 13, example 12), while for n > 4, 
this follows from the famous theorem of Lagrange, according to which any 
nonnegative integer is the sum of four squares ([3], chapter 13, example 5). 


11. Prove that each prime p of the form 4k +1 can be represented in exactly 
one way as the sum of the squares of two integers, up to the order and 
signs of the terms. 


Fermat 


Proof. The fact that any such prime p is a sum of two squares is the content 
of theorem 4.1. Let us focus on the uniqueness part. Assume that we have 
p= T? +y? = 2? + t°. Then (x — z)(x +z) = (t — y)(t +y). We will need the 
following very useful 


Lemma 4.4. Ifa,b,c,d are nonzero integers such that ab = cd, then there 
are integers m,n, p,q such that a = mn, b = pq, c = mp, d = nq. 


Proof. Let € = d = > be the representation of the fraction $ in lowest terms. 
Since ap = nc, we have plc, so we can write c = mp for some m. Then also 
a= mn. Doing the same with dp = nb yields the conclusion. LJ 


196 Chapter 4. Primes and Squares 


Coming back to the problem, assume that |x| # |z| and |t| 4 |y|. Then 
by the lemma we can find nonzero integers m1, 71, p1, qı Such that 


L-z=mn, C+tz=pig, t-y=mp, t+Y= nq. 


But then 
_ min + pig _ Mq — Mp1 


2 2 
so that using Lagrange’s identity we obtain 
4p = 4(x* + y*) = (mz + q1 )(ni + pi). 


We may assume that p divides m? + q?, so that n? + p? is equal to 1,2 or 4. 
As ™1,71, 71,91 are nonzero, we obtain a contradiction unless nı = pı = +1. 
But in this case we get z —z=+(t—y) and z + z = +'(t + y), thus 


tz], zl} = tll Iyl} 


and we are done again. O 


We continue with an easy exercise in Lagrange’s formula. 


12. Prove that the equation 3* = m? + n? + 1 has infinitely many solutions 
in positive integers. 


Saint-Petersburg Olympiad 
Proof. Guided by the formula 
32° — 1 = (2? + 27)(32 +1) (32T +1), 


we will choose k = 2%. Since all factors in the product are sums of two 
squares and since the set of numbers which are sums of two squares is stable 
by multiplication, it follows that 32° — 1 is always a sum of two squares. Since 
it is trivially not a perfect square (it is of the form 3k + 2), the conclusion 
follows. LJ 


197 


Proof. We will show that 32° = m? + n? +1 has infinitely many solutions. 
Indeed, start with the observation that 37! = 2? +2? +1. On the other hand, 
if (k,m,n) is a solution with m > n, then (2k,3*'m — n,3*n + m) is also a 
solution. Indeed, this follows from 


(3°m—n)? + (3*n+m)? +1 = (3% +1)(m? +n?) +1 = 9% -141=3*. 0 


In the following two problems we will use the fact that the density of the 
set of positive integers all of whose prime factors are of the form 4k + 1 (or 
4k + 3) is zero. This is a nontrivial result, for a proof of which we refer to [3], 
chapter 4, example 10. 


13. It is a long standing conjecture of Erdős that the equation 


4 1 1 1 

— z= — + — + — 

n zt y z 
has solutions in positive integers for all positive integers n. Prove that 
the set of those n for which this statement is true has density 1. 


Proof. We look for solutions with y = z and z = na for some positive integer 
a. The equation becomes 4ry = n(y + 2x) or, equivalently, y(4a — 1) = 2na. 
Thus, if we can find a prime factor p of n of the form p = 4a — 1, then we can 


take y = a“ and we have a solution in positive integers. ‘Thus, it is enough 
to prove that the set of integers having at least one prime factor of the form 
4k — 1 has density 1, which has already been discussed. E 


14. Let T be the set of positive integers n for which the equation n? = a? +b? 
has solutions in positive integers. Prove that T has density 1. 


Moshe Laub, AMM 6583 


Proof. We will prove that n € T if and only if n has at least one prime factor 
of the form 4k + 1. Suppose first that n € T and choose positive integers a, b 
such that n? = a? + b?. If all prime factors p of n are of the form 4k — 1, then 
for any such p we have pla? + b?, so that p divides a and p divides b. Dividing 
the previous relation by p? and repeating the argument, we deduce that p’?”) 


198 Chapter 4. Primes and Squares 


divides a and b. Since this happens for all p|n, it follows that n divides a 
and b, which is clearly impossible. Conversely, if n has a prime factor p = 1 
(mod 4), by Fermat’s theorem we can find integers c,d such that p = c? + d?. 
We may assume that c,d are positive (they are nonzero since primes are not 
perfect squares). But then 


= (EA (8) 


and so n € T. Now, the density of those numbers which are not divisible by 
any prime of the form 4k + 1 is 0 and we are done. E 





We continue with two very nice problems concerning primes of the form 
4k+1. The method used in the solution of the following problem is standard. 


15. Let p be a prime number of the form 4k + 1. Prove that 


p-1 
4 


> Lv = r 


j=l 





Proof. Write p = 4k + 1 and note that 


k k 2k 
dIvil= do 2. 1=), } b 
j=1 j=l i?<jp i=1 k>j> 
~~ P 
since the inequality 7 < /jp with 1 < j < k implies that i < 2k. On the other 
hand, the condition j > z is equivalent to 7 > 1 + Bi since 5 is not an 
integer. Thus we can also write 
k 2k i2 2k j2 
Sva- Èq- E-e- 


and it remains to prove that 


199 


Note that 5 
1 
i (mod p) =i? —p =| 
P 
and since , 
Kg pk(2k +1) 
yen MY 
3 
1=1 
it remains to prove that the sum of the quadratic residues mod p is pk. But if 
£1,22,...,£p-1 are the nonzero quadratic residues mod p (any residue mod p 
is implicitly taken between 0 and p— 1), then p — z1,p — T2,...,p— Tək are a 


permutation of the z;’s (since —1 is a quadratic residue mod p, which follows 
from p = 1 (mod 4)). Thus 


2k 


2k 
` Ti = S (p — Ti) 
i=1 


i=1 
and the result follows. C 


The following functional equation is rather nonstandard. 


16. Find all functions f : Z* — Z with the properties: 


1. f(a) > f(b) whenever a divides b. 


2. for all positive integers a and b, 
f(ab) + f(a? + b°) = f(a) + f(b). 
Gabriel Dospinescu, Mathlinks Contest 


Proof. Since 1 divides any integer, it follows that f(1) > f(x) for all x. Let 


k = f(1). 
The first step is to prove that f(n) only depends on the prime factors of 
n and not their multiplicities, i.e. 


f(n)=F | ]]p 


pln 


200 Chapter 4. Primes and Squares 


Indeed, consider two positive integers a,m and choose b = am. Thus 
f(a’m) + f(a (m + 1)) = f(a) + f(am) 


and by the first condition f(a) > f(a?(m?+1)) and f(am) > f(a?m). Thus 
both of these inequalities must be equalities and so f(am) = f(a?m) for any 
positive integers a,m. This immediately proves the claim. 

In the second step, we prove that f(n) does not depend on the prime 
factors of n that are congruent to 1 or 2 modulo 4. Indeed, the proof of the 
previous claim shows that for all n and x we have f(n) = f(n(z*+1)). In 
particular, f(n) = f(2n) and so we don’t have to care about possible powers 
of 2 in the prime factorization of n. Also, if p = 1 (mod 4) and z is chosen 
such that p|z* + 1, then 


f(n) > f(np) > f(n(x? + 1)) = f(n) 


and so f(np) = f(n). In conclusion, we have 


finy=Ff{ [] p}, 
p\n,peE P3 
where P; is the set of prime numbers of the form 4k + 3. Let pj, po,... be the 
elements of P3 and define g(A) = f([],c4 Pa). We obtain a function g defined 


on the set of all subsets of N with integral values. Moreover, we claim that 
g(0) = k, S1 C Sg => g(S1) > g(S2) and finally 


g(S1) + g(S2) = g(S1 U S2) + g(S1 N S2) 


The first two relations are obvious. For the third one, note that if the sets 
of prime factors congruent to 3 mod 4 of a and of b are A, B, then the set of 
prime factors of the form 4j + 3 of ab is exactly A U B and the set of prime 
factors of the form 4j + 3 of a? + b? is ANB. If g({n}) = kn for some integers 
kn < k, then an easy induction on |A| shows that 


g(A) = X` ka — (|A| = 1)k. 
acA 


Conversely, any choice of such k, yields a corresponding g and the previous 
construction yields a solution f of the equation. This ends the solution. OU 


201 


The following challenging problem requires some estimates about prime 
numbers that follow from Dirichlet’s theorem, for which we refer the reader to 
addendum 7.A. 


17. Prove that the equation 2° 


nonnegative integers. 


= n! + 1 has only finitely many solutions in 


Proof. Suppose that (z,n) is a solution of the equation z8 = n! + 1. Then 
n! = (x? —1)(z* +1)(z* +1). 


Let An be the set of prime numbers p < n of the form 4k +3. The key point is 
that no p € A, can divide z? +1 or xt +1, so that pep n!) must divide x? — 1. 


Thus we obtain 
Yn! >r? -1> I] peri) 
pEAn 
Then, using that vp(n!) > n/p — 1 (by Legendre’s formula), we obtain 


1 
—ninn > ln Vn! > ` (z — 1) ln p. 
p 


4 
pEAn 


A classical inequality of Erdős (theorem 3.A.3, chapter 3) yields 


` Inp < ln [| < nln4. 


pEAn pín 
We deduce that 
` mp <In4+ =" 
peEAn 
for any solution (x, n). 
Now, it remains to prove that there are only finitely many such integers 


n. This is a consequence of the proof of Dirichlet’s theorem, which establishes, 
among many other things, that 


1 1 
Inn > To 3 
nN EAn p 


for n — oo. See addendum 7.A for a proof. O 


202 Chapter 4. Primes and Squares 


It is really amazing that the following result has a purely elementary 
proof. We present here the beautiful idea of John H.E.Cohn and we refer the 
reader to [18] for other similar results (including the fact that 1 and 144 are 
the only squares in the Fibonacci sequence). 


18. Let Lo = 2, Ly = 1 and Ln+2 = Ln+1 + Ln be Lucas’s famous sequence. 
Then the only n > 1 for which L, is a perfect square is n = 3. 


Cohn’s theorem 


Proof. If x; and zə are the roots of the polynomial X? — X — 1, then Ln = 
ct + 2%, which combined with zz = —1 yields Lo, = L2 — 2(-1)". This 
already shows that if Ln is a perfect square, then n is odd, for y? + 2 is never 
a perfect square. 

The case when n is odd is much more subtle. There are two key properties 
of the Lucas numbers that make everything work. The first is that Ly = 3 
(mod 4) whenever k is an even number not a multiple of 6 (use the previous 
formula for L2, and the fact that Ln is odd whenever n is not a multiple of 
3). Call such a number k good. The second key ingredient is the fact that Lk 
divides Ln+2k + Ln for all good numbers k and all nonnegative integers n. This 
follows from k = 0 (mod 2), the equality z1ı£2 = —1 and the computation 


Lntok + Ln = vi (c?* +1) 4+ 28 (r2* +1) = 


k 
ort kok + ok) 4 ott 8 (oh + 05) = LeLnek- 


Assume now that n = 1 (mod 4) and n > 1. Then we can write n = 
1+2-3’k for a nonnegative number r and a good number k. Applying the 
second key point 3” times, we deduce that Ln = -Lı = —1 (mod Lg). As 
Ly = 3 (mod 4), it has a prime divisor of the form 47 + 3 and so Ln cannot 
be a perfect square. 

Assume finally that n = 3 (mod 4) and n > 3. Then we can write 
n = 3+2.-3’k for a nonnegative number r and a good number k. The 
same argument shows that Ln = —L3 = —4 (mod Lk) and we reach the same 
conclusion, as numbers of the form z? + 4 have no prime factors of the form 
47 + 3. O 


4.1. Notes 203 


4.1 Notes 


Many of the solutions to the problems in this chapter were provided by the 
following people: Alexandru Chirvasitu (problem 14), Daniel Harrer (problems 
2, 3), Benjamin Gunby (problems 1, 6, 9), Fedja Nazarov (problems 12, 15), 
Gjergji Zaimi (problems 5, 7, 13, 16). 


Chapter 5 


T5’s Lemma 


All of the following problems fall to a rather handy inequality, known as 
T>’s lemma even though it is a special case of the Cauchy-Schwarz inequality. 


This result says that for all real numbers a 1,a@2,...,a@, and all positive real 
numbers 21,22,...,2n the following inequality holds 
2 2 2 o. 2 
MO iny tatt On)” 
XL, T2 Tn Tı +T2 +: + Tn 


To get the reader familiar with this trick, we start with a series of more 
direct applications. 


1. Let z1, £2,..., Zn, Y1, Y2,- yo Yn be positive real numbers such that 


Tı T2 +: + Tn 2 T1Y1 + T2Y2 +- + Inn. 


Prove that 


T T T 
yı y2 Yn 


Romeo Ilie, Romania 1999 


206 Chapter 5. T>’s Lemma 


Proof. Using T>’s lemma, we can write 


GODEN 
Dy -52 TiYi Z SS miyi 2 dt 


the last inequality henge exactly the hypothesis. LJ 


2. Let a,b,c be nonzero real numbers such that ab + bc + ca > 0. Prove 


that 
ab bc ca 1 


——  —— . 4 —— . > - =. 
a2? +b? be c?+a*~7~ 2 

Titu Andreescu 

Proof. The trick is to add 5 to each fraction, in order to exploit the identity 


ab l- (a+b)? 
a2+b2 2 2(a? + 6?) 





With this observation, the inequality becomes 
2 
a? + b? 


and it is then a trivial consequence of 75’s lemma, combined with the hypoth- 
esis ab + bc + ca > Q. O 





3. Prove that for any positive real numbers a,b,c,d satisfying 
ab + bc + cd + da = 1, 


the following inequality holds 


_— 


a? b3 c’ d? 
I y G G O, 
bcd c4dła dab aibi? 3 


IMO 1990 Shortlist 


207 


Proof. T2’s lemma gives a lower bound 


a? (Za) 
Dirai? S alb+c+d) 


X a(b+c+d) = (Ea) -Ee < 35 a’, 


it remains to prove that >> a? > 1. But again by Cauchy-Schwarz we have 


Since 


1 = (ab + bc + cd + da)? < (a? +b? +c? + d’)?, 
which proves that Y` a? > 1 and finishes the solution. L 


4. Prove that if the positive real numbers a,b,c satisfy abc = 1, then 


a b c 


— + ——— 4+ —— dol. 
b+c+1 ctat+l at+b417— 
Vasile Cartoaje, Gazeta Matematica 


Proof. The solution using T>’s lemma is straightforward: 


2 2 
yt ay 5 
b+c+1 a(lb+c+1) ~ 2$ ab+) a 
so that we only need to prove the inequality ¥` a? > ` a. This follows from 


Ya > Zol and $` a > 3 (the first being a consequence of Cauchy-Schwarz, 
the second being the AM-GM inequality). C 


5. Let a,b,c be real numbers such that 


1 n 1 + 1 ~? 
a2 +1 624+1 c?+17 








Prove that 
ab + bc + ca < 


bo] Go 


Titu Andreescu 


208 Chapter 5. To’s Lemma 


Proof. Observe that 
2 
On the other hand, we have 
a > (a+b +c)? 
a2? +1 ~ a? +b +23 
Combining the two inequalities immediately yields the result. O 


The following problems are a bit less straightforward. However, they do 
not require any heavy machinery. 


6. Prove that for any positive real numbers a, b,c, 


1 R 1 R 1 R 1 (a+b + c+ Vabe)? 
a+b b+c c+a 2Wabe~ (a+b)(b+c)(c+a) 











Titu Andreescu, MOSP 1999 


Proof. The solution using T>’s lemma is a bit tricky: the point is to look at 
the denominator of the right-hand side, because 


(a+b)(b+c)(c+a) =(a+b)c? + (b + c)a? + (c + a)b? + 2abc. 
This suggests writing the left-hand side in the following way 


c? a? b? (Vabe)? 
+ = YH eH 
c?(a+b) a%(b+c) b2(a+c) 2abc 





A direct application of T>’s lemma finishes the proof. C 


T. Prove that for any positive real numbers a,b,c the following inequality 


holds 
a 2 b 2 c 2 3 a+b +e 
— ]) + + >= —— 
b+c c+a a+b 4 ab+bc+ca 


Gabriel Dospinescu 








209 


Proof. Using T>’s lemma, it is sufficient to prove the inequality 
(a? + b? + c?) 53 a? +b? +c? 
a*(b+c)*+b4(c+a)?+c*%(a+b)* ~ 4 ab+bc+ca 


Note however that the obvious application of T>’s lemma fails. The previous 
inequality is equivalent to 


4(a? +b? + c*)(ab + bc + ca) > 3(a7(b + c)? +.b7(c +a)? + cla +b)?). 
Unfortunately, the only reasonable way to prove this is to expand it in the 
form 3 1 

2, 22 2,2 
S = abla +b°) > 32,0 bl + sabe ) a 
Fortunately, this is trivial, since 
` abla? + b?) >2 ` a?b? and ` a*b? > abc ` a. O 


It is possible to solve the following problem using J>’s lemma, but the 
proof is not really elegant. In the addendum we discuss some applications of 
Holder’s inequality, which makes this problem really easy. 


8. Let a,b,c be positive reals such that abc = 1. Show that 


1 1 1 


aS 
a°(b + 2c)? + b°(c + 2a)? + cï(a + 2b)? — 


l 
7 
Titu Andreescu, USA TST 2010 


Proof. Start by making the substitution 


1 1 z 1 
T = — = =, = =, 
a’ Tb C 
The inequality becomes 
r’ 3 z3 1 
Tang t — + 5 2 5° 
(z + 2y) (x +22) (y + 22) 3 


210 Chapter 5. To’s Lemma 


Applying T>’s lemma, it is enough to prove that 
3(x2 +y? +23)? > YO r? (z + 2y)?. 
The right-hand side is equal to 5 $- r?y? +4 $` x. Thus, we need to prove that 


3X" +65 (zy)? > 5X (zy)? +4) r. 


Note that 


Slay)? = Flay)? yaw > = vay Yey)? > Wow)’, 
3 


the first inequality being Chebyshev’s, the second one by the AM-GM inequal- 
ity and the fact that xyz = 1. Thus, it suffices to prove that 


3S +A 27? > 4) a. 


But 32° + y2z? > 4x4 and so it is enough to prove that Slat >) >a. This 
follows from the power-mean inequality and the fact thatr+y+z2>3. O 


Proof. As in the previous solution, we reduce the problem to proving the 
following inequality 
r3 y3 z3 


(+y) (e+ 22” yF 2r) 


IV 
Go| — 


Using the AM-GM inequality, we can write 


r’ 2y +z 2y+z._ 7 


Qy+2)2 27 2 73 








and two similar inequalities. Adding them yields the following estimate 
3 3 3 
T z r+yt+2 
wy vy gt tite 
(z+2y)? (4+2z)? (y+ 2z) 9 


and we end up using once more the AM-GM inequality. E 


211 


It is really not easy to prove the following inequality using T>’s lemma, 
but the trick is worth remembering. 


9. Prove that for any positive real numbers a, b,c the following inequality 
holds 


1 1 1 1 1 1 
pg > ep Lp 
3a4b 3b4c. 3e4a~ Datbte. Mtcta wtatd 


M.O. Drambe 


Proof. Choose three positive real numbers a, 8, y and use 7>’s lemma in the 
form 


8B yy (BH 
3a+b  3b+c 38c+a~ a(3aty) +b(38 +a) +c(37+B) 


Now, we impose the conditions 
B8at+y=2, 3B+a=1, 37+ f= 1. 


Solving this linear system yields the solution a = 5, B= , y= z, Therefore, 
we obtain the inequality 


4 1 1 1 2 1 1 
— . —— 4+ — - —— +H - —— od aooi 
7T 8a+b 7 3b+c 7 3c+a 2a+b+c 


Proceeding in the same way with the two other terms of the left-hand side and 
adding up the resulting inequalities yields the desired result. LJ 


Proof. We can also use the trick of integrating polynomial inequalities to de- 
duce fractional inequalities. Namely, the inequality 


ey + y’z + 222 > ryz(x+y +2) 


can be easily proved using J>’s lemma, since it can be written in the form 


i) 


2 z 


y 


T 


+— + >T+y+z. 


R |S. 


Z 


212 Chapter 5. To’s Lemma 


Using this inequality, we can write 


patb—l1 4 ¿90+c-1 4 peta—1 > p2atbtc—1 4 pebteta—1 4 pectatb—1 


for all0 < t< 1. Integrating this between 0 and 1 yields the desired inequality. 
E 


Remark 5.1. The technique used in the second proof looks unusual. It is 
actually quite powerful and we refer the reader to [3], chapter 19 for many 
more applications. 


The following two problems are closely related and use a rather useful 
inequality. 


10. Prove that for all n > 4 and all z1, £2,...,£n > 0, 


T T T 
Tig 2 4+..-+——" >? 
In + T2 Tı + T3 In-1 + T1 





Tournament of the Towns 1982 


Proof. We start in the usual way by using Tz’s lemma: 


T1 + T2 4 4 Tn 
Tn t+ T1 +T3 Tn-1 +7, | 





(T1 +22 +-++-+2n)? 
T1(Lpn + LQ) + T2(£1 + T3) +--+ + 2n(Ln-1 +21) 


Since 
£1(Ln +L2) + £2(£1 +23) +--+ + 2n(Fn-1 +21) = 2(T1T2 +2973 4+-+++ +2771), 


it remains to prove the following very useful 


Lemma 5.2. If n > 4 and 21,22,...,2n are nonnegative real numbers, then 


(£1 +22 +--+ En)? > A(z x2 + 2973 +-++ +272). 


213 


The proof is a bit tricky. If n is even, the inequality follows trivially from 
the chain of inequalities 


4(£1T2 +--+ +271) < 4(2, +23 +--+: +2n_-1)(Zo +24 +: +2Tn) 
< (£1 +22 +: +n), 
the first one being obvious and the second one being the AM-GM inequality. 
For n odd, things are subtler, but we can reduce the problem to the 


case when n is even by the following mixing argument: we may assume that 
L1 > T2, SO that we trivially have 


£129 + T23 + T3T4 < T1T2 + 2173 + 7324 < T1(T2 + 23) + (T2 + 73) 24 


Thus, replacing £1, £2,..., £n by £1, £2 + T3, T4,..., Zn, we preserve the sum 
of the z;’s while not decreasing the quantity 7172 +--- +22). Since n — 1 is 
even, everything follows from the previous step. L 

11. Let n > 4 be an integer and let aj, a@2,...,@n be positive real numbers 


such that a? + až +---+ a2 = 1. Prove that 


ay n ao tod An 
az+1  az+1 @+1- 








(a1 ay + azaz +- + an Van)”. 


AAN 


Mircea Becheanu and Bogdan Enescu, Romanian TST 2002 


Proof. Applying T>’s lemma we obtain 
> (So ai fai)” 
Zari a?(a?,, +1) @, +1) =a (a2, +1) 


Since $` a? = 1, we have )> a?a?,, < 1/4 by lemma 5.2. The result follows. O 


We give two proofs for the following beautiful problem. The first one 
is a standard application of T>’s lemma, the second one uses a very useful 
technique. 


214 Chapter 5. T2’s Lemma 


12. Prove that for any positive real numbers a, b,c, the following inequality 


holds 
a b c 


— + + MMO 
Va? +8bc vb?+8ca vce +8ab — 
Hojoo Lee, IMO 2001 


Proof. The most natural application of T2’s lemma turns out to work very 
smoothly. Indeed, 


(a+b+ c)? 


2 
a a 
5 —— = X > l Sú 
Va? + 8bc ava? +8bc X ava? + 8bc 


The only problem is to show that 


X ava? + 8be < ( (a+b+c) 


This suggests using Cauchy-Schwarz and, indeed, we have 


(Sv ava? + 860)" < VY a: VOD + 24abc. 


Thus, the problem is solved if we can prove the inequality 
Noa’ + 24abc < (a + b + c)’. 


Fortunately, after expansion, this becomes Y` a(b — c)* > 0, which is obvious. 
c 


Proof. Make the substitution 


| a? | b | e 
= V a2+8bc’ y= b2 + 8ac’ “= c2 + 8ab 


Cc 


Multiplying the relation 4 —1 = 8% 


relation 
1 1 1 B 
-2T 1 7 — 1 z7 1) = 512. 


and the two similar ones yields the 


215 


The problem asks to prove that under this assumption we have r+ y+ 2 > 1. 
Assume that this is not the case, so x +y +z < 1. Let 


T Z 
X = —— >T, y-—4% >y, Z = ——— > 
I+yt+2 E+yt2 E+ryre2 


so that X +Y + Z = 1 and 


1 1 1 
512 > x2} yal 7} 
We deduce that 


512X?Y? Z? 
> (X +Y)\Y +Z(Z+X\(2X +Y +2Z(2Y +X+Z(2Z+X+Y), 


which is impossible since X +Y > 2V XY, 2X +Y +Z > 4VX?YZ and the 
similar inequalities obtained by cyclic permutations. CI 


There are two traps in the following problem, making the problem harder 
than it appears at first sight. 


13. Let a,b,c be positive real numbers such that ab + bc + ca = 3. Prove 


that 
a b 


c 
fH," gy 
2a 4b? | 2b 2 War -7 


T.Q. Anh 
Proof. The inequality we have to establish seems to be opposite to the usual 
applications of Ta’s lemma. This is actually not a very serious problem, since 


we can always change the terms of the sum a bit and change the sign of the 
inequality. Indeed, note that 


a lf, 
9a+b2 2 2a+b2 j ` 


216 Chapter 5. T2’s Lemma 


Thus, the inequality is equivalent to 
b? c? a? 
—; tdl 
2a + b? + 2b+c2 2c+a? 7 





And now, another trap: one would be tempted to use Tọ’s lemma in the 
obvious form, but it is not difficult to check that this does not work. Instead, 
we take advantage of the hypothesis ab + bc + ca = 3 to write 


p ova? _ (206) 
D ure =) ob LB z 6+ 5563 © 


Thus, it remains to prove that Y` (ab)?/2 > 3, which is immediate by the 
power-mean inequality and the hypothesis. LJ 





The following problem is also quite tricky. 
14. Determine the best constant kn such that for all positive real numbers 
Q1, 02, ..., Qn Satisfying a1a2'''an = 1, the following inequality holds 
a1 a2 a243 NESE Ana 
(a? + a2)(až +aı) (až +.a3)(a2 + a2) (a2 + ai)(a? + an) ~ 
Gabriel Dospinescu, Mircea Lascu 
Proof. The basic inequality that will be used is 
(2° + y)(y* + z) > zy(1 + z)(1 +y). 


This reduces immediately to (x? — y?) (x — y) > 0, which is clear. This already 
shows that for n = 2 the maximum value is kp = 4, since (1 + £)(1 + y) > 4 
if ry = 1. 

Let us assume now that n > 2. We will prove that k, = n — 2. To prove 


that kn > n — 2, it is enough to exhibit sequences aj, a2,...,@n for which the 
value of the expression 
n 
F(a, Q2,... Qn) = ` itil 


SF (af + ai+1)(af,1 + ai) 


217 


is close to n — 2. This can be done easily, by taking n — 1 of the variables 
equal to some zx very close to 0, and the last variable equal to z!~”. The 
difficult part is proving that F'(a1,a@2,...,@n) < n — 2 holds for any sequence 
Q1,Q2,...,Q, as in the statement. Using the basic inequality, this reduces to 
proving that 


n 


1 
— <n-? 
>, (1 + a;)(1 + ai+1) 7 


i=1 





for any positive numbers a; with a,a2---@n = 1. To prove this, write a; = aa , 
with £n+1 = z1. Subtracting 1 from every fraction, we obtain the equivalent 
inequality 

5 TiTi+1 + TiTi42 + 2, 

=] (zi + Ti41) (Lig + Ti+2) — 
This is too complicated to try a Cauchy-Schwarz approach, but it simplifies a 
lot if we observe that 


LjLj41 + TiTi+2 + T? E Ti + T?) 
(£i + Ti41)(Ti+1 + Ti+2) Ti+ Vin. (Ti+ Ti+1)(Ti+1 + Ti+2) 


Ti Ti 
— > — = l, 


it remains to prove the inequality 


Since 


Vi4] 
yt sy 
(£i + Dig1)(Lig1 + Ti+2) 


T>’s lemma reduces this to the easier inequality 


D vi) > ` r? +2 ` TiTi+1 + ` TiTi+2. 


Expanding the left-hand side makes the previous inequality obvious and fin- 
ishes the proof of the fact that kn = n — 2 for n > 3. CI 


The following difficult problem seems to be exactly the opposite of what 
is usually called a standard application of To’s lemma. It turns out that we 
can actually apply Tz’s lemma, but in a very nontrivial way. 


218 Chapter 5. T>’s Lemma 


15. Prove that for any positive real numbers a, b, c, 


(Qa+b+c)? (2b+c+a)* (2ct+a+b)? eR 
2a? + (b+c)? 2b?+(c+a)? 2c?+(a+b)2 ~ | 


Titu Andreescu and Zuming Feng, USAMO 2003 


Proof. Writing 











b+c a+c a+b 
T = ) y ) Zz = ? 
a b C 
we can write the inequality as 


2 2 2 

(2 +r) (2+y) (2+2) < a, its 1+ 2y 1+2z 5 

2+ 22 2 +y? 2+2? — r? +2 y2+2 22427 2 
z- 1)? —1)? (2-1)? _ 1 
@-1? w- a1 
rT? +2 y2 +2 z22+2 72 

In order to prove the last inequality, we use 7>’s lemma in the obvious form 

and so it suffices thus to show that 

















> 





(ctytz—3)? _1 
E -_ > _,. 
rT? +y% +2? +6 2 


Expanding (x + y + z — 3)”, we reduce this to proving that 


X x? —12X 2+4) ry+12>0. 


This is not obvious, but the observation that x? +4 > 4x reduces it to proving 
that X` zy > 2x, which is problem 7 in chapter 1. oO 


Proof. This solution uses the linearization method. To simplify computations, 
we may assume (since the inequality is homogeneous) that a+b+c = 1. Define 
the map 
= («+1)? 
f(z) = 2x2 + (1-2)? 
The inequality can also be written as f(a)+ f(b)+f(c) < 8. We will try to find 
u,v such that f(x) < ux +v for any z € (0, 1], with equality for x = ł (which 


219 


is the obvious equality case in the original inequality). Thus, we should have 
f (3) = 3+v and also f’ (3) = u, since would be a minimum of ux+v-— f(z) 
on [0,1]. A straightforward, but tedious computation shows that u = 4 and 
4. We need to prove now that this pair (u,v) really works, i.e. that 
f(z) < ux +v holds for any zx € [0,1]. Clearing denominators and expanding 


everything yields (after a tedious computation) the equivalent inequality 


v= 


36x? — 1527 — 2x + 1 > 0. 


But the conditions we imposed were made so that the left-hand side is divisible 
by (3x — 1)?. Doing the euclidean division shows that the left-hand side is 
(32 — 1)*(4r +1), which is obviously positive. Finally, we just have to add the 
inequalities f(x) < ux + v for x € {a,b,c} to end the proof. c 


The following problems are harder than the previous ones. They are still 
based on To’s lemma, but applied in more subtle ways and often combined 
with other tricks. 


16. Let a,b,c,d be positive real numbers such that abcd = 1. Prove that 


1 1 1 1 
— O o — 4 — >]. 
Oa 04o te?’ (tap = 





Vasile Cartoaje 


Proof. The proof using Tz’s lemma is rather mysterious. By performing the 
substitution 


t x 
a= 2 b=-, c=-, d=-, 
x y z t 
we obtain the equivalent inequality 
r2 y? z2 +2 


G+ G+ GFO Gta?! 


Of course, an immediate application of T>’s lemma fails rather badly, so we 
need something more clever. Let us apply Tz’s lemma for the first two and 


220 Chapter 5. T5’s Lemma 


then for the last two terms of the inequality. If we try to prove the stronger 
inequality 
(x+y)? (z +t)? 5 
(x+y)? +(y +2) (24+t)?+(t+2)? ~ 
we easily realize! that it is equivalent to (y + z)(t+ £) < (x + y)(z + t). 
Unfortunately there is no reason to have (y + z)(t+2) < (x + y)(z + t). The 
miracle is that if this fails, then we can apply Tọ’s lemma for the first and 
fourth terms of the initial inequality and then for the middle terms. In this 
case, we obtain the stronger inequality 


(x +t)? (y + z)? >] 
(x+y)? +(+)? (yt+z)2?+(z4+1t)? — 


? 


A similar argument shows that this holds precisely when (x + y)(z + t) < 
(y +z2z)(x+ t), in particular whenever (y + z)(t +x) < (x +y)(z+ t) fails. The 
result follows. C 


Proof. This solution is not natural, either, but it is very elegant. We claim 
that for all positive numbers a,b we have 


1 1 1 
—— p aoM, 
(+a (1+b)? ~ 1+ ab 


This is rather easy to prove, though it requires some nasty computations. 
Namely, by clearing denominators and performing the obvious simplifications, 
we reduce the problem to proving that 


ab(a? — ab + b?) — 2ab + 1 > 0, 
which is equivalent to 
ab(a — b)? + (ab — 1)? > 0. 


Once we have this inequality, we deduce that 





Sol >l 4l -l4 2 
(1+a)? © 1+ab 1+cd 1+ab 1+ab 
and the result follows. Ol 


It is convenient to denote a = ytz and b = te. 


221 


The following problem is a bit tricky, especially because of the strange 
hypotheses. 


17. Let n > 16 be a positive integer and suppose that the positive numbers 
Q1,02,...,Qn Satisfy aj +ag+---+@, = 1 and a} + 2ag+---+nay = 2. 
Prove that 


(a2 — a1) V2 + (a3 — a2)V3 +--+ + (an — an-1) V^ < 0. 
Gabriel Dospinescu 


Proof. The first step is to apply Abel’s summation formula to rewrite the 
inequality as 


a1 V2 + a2( V3 — V2) +--+ +4n-1(/n — vn — 1) > anV/n. 
Using the obvious bound a; V2 > ai (V2 — 1) and the formula 
1 
Vk — Vk-1 = ’ 
Vk+vk-1 
an application of J>’s lemma shows that the left-hand side is greater than 


(aj t+ag+-: -+ an- 1)* 
Sop I a(i + Vit 1) 


On the other hand, Cauchy-Schwarz together with the hypothesis give the 
estimate 





and also 


n—-1 
` aivi +1 < V3. 
i=1 


Therefore, taking into account that ai +a2+---+an_-; = 1 — an, we still need 
to prove the inequality 
(1 


Qn)” n. 
Via va ov 


222 Chapter 5. T>’s Lemma 


But since 
1 = a1 + 2a2 + ... + Nan — (a, +... + an) > (n — 1)an, 


we have an < L and so it is enough to prove the previous inequality with an 


replaced by L. But this is immediate for n > 16, which ends the solution. 
E 


We end this chapter with two challenging inequalities. The first one uses 
a combination of T>’s lemma and a rather subtle linearization technique. 


18. Prove that for all positive numbers a, b,c the following inequality holds 


a 4 b 4 C >] 
8b +c V8c+a 8ga+b 7 


Vo Quoc Ba Can 








Proof. We use T>’s lemma in the form 


a (va)? 
8b+c  a(8btce) 





so it remains to prove that 


(Eva) > £ Vato. 


Define z = va, y = vb, z = yc, so that the inequality becomes 


Sevir (Ye). 


This is actually a very strong inequality and it is easy to see that it resists any 
attempt to prove it using Cauchy-Schwarz (even if its form invites us to use 
such a technique). We will use a linearization technique, by approximating 
first \/8y2 + 2? by a rational function in y,z. The crucial ingredient is the 


following estimate: 
V 8y2 + 22 < 3y + z — 3yz 
-7 2y +z 





223 


The hard point is to figure out that such an inequality holds, since proving it 
is a very easy matter. Note that the right-hand side is clearly positive, so by 
squaring and canceling similar terms we obtain the equivalent inequality 
Qy? z* 6yz(3y + z Ay*(y — z)? 
y2 4 6yz + 2 - > ZOV ) y“ (y ys 
(2y + 2) 2y + z (2y + 2) 


Using this estimate we conclude: 


X ayV/8y? + 2? < 4) zy- 3ryz5_ 





2y +2 


and since 





yas rye 
Qyt2 7  ectyt2’ 


it remains to prove that 


— Sryz 
Da yas EPHE 


It is not difficult to check that the last inequality is actually equivalent to 
Schur’s inequality, which finishes the proof of this hard problem. E 


The next problem requires some preliminaries. We will prove the following 
beautiful discrete variant of a classical inequality of Wirtinger. 


Theorem 5.3. (Fan’s inequality) If £1, £2,...,£n are real numbers which add 
up to zero, then 


2 an2 2 27 
(Ti +23 +--+ + Tp) cos — > T1T2 + T2T3 +`: + TnT]. 


Proof. We will actually mimic the proof of Wirtinger’s inequality by using 
finite Fourier transforms (for more on this fascinating subject, the reader is 
invited to read the addendum 7.A). Namely, if z1, Z2,...,Zn is a sequence of 
complex numbers, define 


224 Chapter 5. T's Lemma 


Using the fact that 


k=1 
if and only if j is not a multiple of n (in which case it equals n), it is easy to 
check that we can recover our sequence from the sequence of its finite Fourier 
transforms using the inversion formula 


2inkj 
žj = aD Zpe n 


We also have the following discrete version of Parseval’s identity: 


n n 


Nla? = N Jal’. 


k=1 k=1 


Indeed, 


3 


9 _ 2in(ky —kg)j 
-1D D are 


k=1 J «ky. ke 


— Halk ta) —k9)j 
= — ~ ) 7a) e 


t kika 


= \ O 2k Zkz 


ki =k2 


= X |l. 
k 


Now, the proof of Fan’s inequality is very easy: write the inequality in the 








Žk 


form 
Solar — than)? > (2- 2005 2) Sa? 
+ 2 n Tk. 
k k 
Note that the hypothesis z1 +£z2+- -+£ = 0 can be written in the very simple 


2inj 


form ĉn = 0. Now, let? Yj = zj — Tj+1 and observe that gy; = (l—e = )ĉj, as 


?This is the analogue of the derivative of a function when establishing discrete inequalities. 


229 


follows from the definitions. Thus, using Parseval’s identity we obtain 
2 _ 
y= 
j j 
=n-ewh 
j 
n-1 On 
> — — 
> >, (2 2 COS m ) 
j=l 
= | 2 — 2 cos an ` 
7 n 


j 


2 








Yj 


2 








Tj 


2 


A 


Tj 








2 


A 


Tj 








and another application of Parseval’s identity yields the desired inequality. L 


We are now ready to prove the following version of Shapiro’s inequality. 
The proof is based on a very tricky application of T’s lemma combined with 
Fan’s inequality. 


2cos 22-1 
n 


19. Let an = ———=—. Prove that for all £1, £2,...,£n € [Łan], the 
Shapiro inequality holds: 


Tı T? Tn 
+ +e p 07 
t2 +T3 T3+7T4 Uy H T2 








n 
> =. 
2 


Vasile Cartoaje, Gabriel Dospinescu 


Proof. First, we write the inequality in the form 


Lji4+1.tXi42 

Ti — 2a2 n 1 
\ > ———"— ‘l 
2 an 
1 


Li41 + Ti+2 


Note that 
Ti+1 + Li42 


1 
T; > — > 
C an T 2a? 


226 Chapter 5. T2’s Lemma 


Using T>’s lemma, it suffices to prove that 


1 
(1-5) (Soa) > (Sa Li41 + Zi¢e) — gz D (ei + zin) y). 


A crucial step is to note that we have the identity 


1 
Li (Li41 + Ti+2) = (x; + Ti+1)(Ti+1 + Ti+2) 7 (x; + Tipi)’, 
2 


which allows us to perform the substitution z; + xj,; = 2b; and reduce the 
problem to proving that 


(1-5) (Ea) > n (25 bibi — +5) de) 


for nonnegative real numbers b;. 

This simplifies drastically if we make another substitution 
bi + bo +--+ +n 
————— 


Ci = b; — 


A rather tedious, but straightforward computation shows that the previous 
inequality is equivalent to 


+ =) 4 > 25  ciciss. 


Of course, we have c1 +c +:-- +c, = 0. Finally, note that 


1 2 
1+ -z — 2 COS T 
ar, n 
Thus, the result follows from Fan’s inequality. C 


5.1 Notes 


We thank the following people for providing solutions: Alexandru 
Chirvăsitu (problem 16), Xiangyi Huang (problem 14), Michael Rozenberg 
(problem 15), Dusan Sobot (problems 1, 3), Gjergji Zaimi (problems 1, 10, 
11). 


Addendum 5.A Holder’s Inequality 
in Action 


The purpose of this addendum is to present some applications of Holder’s 
inequality, which are quite similar to the problems considered in this chap- 
ter. Of course, Holder’s inequality is important because of its applications in 
measure theory, probability theory and analysis and not really for the amus- 
ing problems to be discussed here. Actually, we will not even deal with the 
classical version 


1 1 
(aj tap +---+aP)(b? +b +---+b2)9 > aibi + azb2 +--+ +Anbn, 


which holds for any positive real numbers a;, bj, p,q such that 5 + ; = 1, but 
rather with the following: 


Theorem 5.A.1. Let aj; be positive real numbers. Then 


J 


k 
(aji + ajo +: + ajn) > (Yara °° + ag +++ + Yanan apn)". 
=1 
Proof. This is a very easy application of the AM-GM inequality. Indeed, just 
add up the following inequalities 


Q11421 °°: Akl Q11 Qk1 
k E| ———— — +. + —, 
SiS Sk T S Sk 


Anan ``’ akn Qin Akn 
k E| ——— — +e + —, 
SiS Sk ~ Sy Sk 
where S; = Qil + i2 +--+: + Qin. O 


We are now able to obtain a generalization of Ta’s lemma that turns out 
to be very handy in quite a lot of situations when T>’s lemma fails. There are 
of course much more general results than the following one, but our purpose 
is not to delve into the greatest generality. 


228 Chapter 5. T>’s Lemma 


Theorem 5.A.2. Let a,,Q2,...,@n and T1, T2,...,Zn be positive real numbers 
and let q > p be two positive integers. Then 


SONE me ~ n!+p-q (A +02 +-+: + an)! 
„P P p — l ; 
Ly T3 Ln (£1 tag +-+-+2,)P 


Proof. Note that by the previous theorem we have 


q q q 
a, a a 

(my + 2+: + Tn)” - ptt +2 
r? 


1 zi Thn 
q q q 
a a a 
= (£1 ++ + Tr) e (T1 +: + En): (G+ G++) 
Ti T Ln 
—4_ 4 4 
> (a; pt+l +a” +.. + qPt! \P+1 
and the result follows from the Power Mean inequality. Ll 


The first theorem is particularly useful when working with sums of square 
(or cubic or ...) roots, which are a nightmare most of the times. Here are a 
few examples that will probably convince you how useful this result is. Most 
of the problems are quite hard to solve by other means. 


A.2. Prove that for all positive real numbers a, b, c such that abc = 1 we have 
Va ++T7+ VB+T7+VË+T7<2Ua+b+o). 


Vasile Cârtoaje 


Proof. This is quite a strong inequality, but the previous results are effective 
in such a context: 


(Va +7+VB+7+ VE +7) 


< (1+1+1)(a+b+ cone + 7bc) + (b? + 7ca) + (c* + 7ab)) 
and the result follows immediately from the inequality 


ab + bc + ca <a +b 4c’. C 


5.A. Holder’s Inequality in Action 229 


A.3. For any positive real numbers a, b,c the following inequality holds ` 
b+c c+a a+b 
Jarti pa Verb 
Pham Kim Hung 
Proof. Let S be the left-hand side. Using Hölder’s inequality, we can write 
S2. (So + c)(a? + be) ) > 8(a +b +c}, 
so it is enough to prove the stronger inequality 
(a+b+c)? >2) (b+ c)(a? + bc). 
An easy computation shows that this follows from Schur’s inequality. Ll 


A.4. Let a,b,c be positive reals such that abc = 1. Show that 


1 1 1 


—— S gp p ë> 
a042)? Beta) Satb ~ 


1 
7 
Titu Andreescu, USA TST 2010 
, are lp p_1l1,_l 
Proof. The substitution a = 7,b= pom: reduces the problem to 
3 3 3 
xv y z 
——— t+ ntad 
(2y +2)? (2z2+a)* (2r+y) ` 
By theorem 5.A.2, 


3 3 3 
T yY z 5 Try tT? 


(2y + 2)? + (22 + 2)? + (2r +y} — 9 


and the result follows from the AM-GM inequality. L 


230 Chapter 5. T>’s Lemma 


A.5. Prove that if 71, 29,...,Zn are positive real numbers with product 1, 
then 


n™(1+a7)(1 + 22) ---(1 + rh) 


1 1 1\” 
> |ti +t +e +tInt— +> +t]. 
Tı T2 Tn 


Gabriel Dospinescu 


Proof. This follows easily by adding the inequalities 


1 
¢/ (xp + 1)(x3 +1)...(aR + 1) > Ti + T122...Li-1Ti+1.-In = Ti + T 
1 


obtained from Holder’s inequality. E 


A.6. For any positive real numbers a,b,c the following inequality holds 





3[ a+b a+b+c -~ a + vab + Wabc 
2 3 -7 3 l 


Kiran Kedlaya 


Proof. Using theorem 5.A.1, we obtain 


3 
(a+a+a) (a+ SS +8) (a+b+c)> (e+ ; at ak) . 


One concludes by observing that {/ ator?) > Vab. Ll 


A.7. Prove that for all real numbers a,b,c we have 


2(1 +a°)(1 +b (1 +°) > (1 +ab+bc + ca)’. 


Michael Rozenberg 


5.A. Holder’s Inequality in Action 231 


Proof. This is a pretty tricky application of Holder’s inequality: 
4(1 + a?)*(1 + b7)?(1 + c?)? = (a7b? +a? +b? +1)? +c? +b? +1) 
-(14+14141)(a? +a? +c +1) 
> (1+ ab + bc + ca)*. 


The result follows. C 


A.8. Prove that for all positive real numbers a,b,c the following inequality 
holds 
abc + ẹ/ (a3 + 1)(b2 + 1)(c +1) > ab + bc + ca. 


Proof. Using Hölder’s inequality, we can write 
V (a8 + 1)(b8 + 1)(c2 +1) > ab +c. 
We would like to have 
abc + ab + c > ab + bc + ca, 


which is equivalent to (a — 1)(b — 1) > 0. There is no reason for this to hold, 
but at least one of the inequalities (a — 1)(b — 1) > 0, (b — 1)(c — 1) > 0 and 
(a — 1)(c — 1) > 0 holds, as two of the numbers a — 1,b — 1,c — 1 must have 
the same sign. The result follows. L 


A.9. Prove that for all positive real numbers a,b,c the following inequality 
holds 


(a — a? + 3)(b° — b? +.3)(c? —c? +3) > (a+b + c). 
Titu Andreescu, USAMO 2004 


Proof. Of course, we cannot apply theorem 5.A.1 directly in this case, but the 
form of the inequality is a temptation to find a way to apply theorem 5.A.1. 
The key point is the inequality 


a — a? +3 >a +2, 


232 Chapter 5. To’s Lemma 


which is equivalent to (a? — 1)(a? — 1) > 0 and thus obvious. Using this, the 
result follows immediately from theorem 5.A.1, by writing 


a +2 =a’ +19 +1. o 


A.10. Let a,b,c be the sides of a triangle. Prove that 


abc 4 abe 4 abc Sa+b4 
b+c-a c+a—b Vatb—-ce~ “ 


Titu Andreescu, Gabriel Dospinescu 
Proof. This is a hard inequality. The point is to use Schur’s inequality 
a° (a — b)(a—c) + b° (b —a)(b—c) + ° (c —a)(c — b) > 0, 
which can also be written as 
abc(a +b +c) > aè (b+c-a)+b(c+a-b)+ela+b-ce) 


with the theorem 5.A.2. Indeed, observe that we can write 


3 3 
Saptea) = SW 2, > Pt 
J 1 1 
( wis) ( a) 
The result follows. C 


A.11. Prove that for all a,b,c > 0 we have 


2 52 2\ 4 
(F+ 245) > 27- (af +b +c’). 
b c a 


Proof. If X is the square root of the left-hand side, then Hölder’s inequality 
yields 
X. (ab? + be? + ca’) > (a? +b? +2)’, 


5.A. Holder’s Inequality in Action 233 


so that after substituting z = a?, y = b? and z = c’, it is enough to prove the 
inequality 
(a+ ytz)® > 27 (xy + yz + zr} (a? +y? + 27). 


But this is immediate from the AM-GM inequality and the identity 
et y%4+2742%A(cyt+yzt2zr) =(x+y4+2)’. LJ 


A.12. Prove that for all positive real numbers a,b,c,d 


a+1 b +1 241 @41-— abcd+1 
a? +1 6741 2+1 d +17 2o 


Vasile Cârtoaje, Gazeta Matematica 


Proof. This is also a very hard problem. Of course, one is again tempted to 
use the theorem, but one needs a trick. The key point is that for all positive 
numbers a we have 
a +] ~ 4 at +1 
a° +17 2 








Indeed, using the inequality 
(a? +1) < (a +1)?(a +1), 
which is just Cauchy-Schwarz, one reduces this to 
2(a3 +1)? > (a + 1)?(af + 1), 
which, after division by (a + 1)? and a small computation is equivalent to 
(a—1)*>0. 


Once we have this key inequality, the rest is an immediate application of 
theorem 95.A.1. Ll 


Chapter 6 


Some Classical Problems in 
Extremal Graph Theory 


This elementary chapter is a variation on a classical topic in extremal 
graph theory: Turán’s theorem on graphs without cliques. Recall that if G is 
a graph and k is a positive integer, then a k-clique is a set of k vertices, any 
two of which are connected. A graph is called k-free if it does not contain a 
k-clique. We then have the following standard result. 


Theorem 6.1. (Turán) The mazimal number of edges in a k-free graph with 

n vertices i k—-2 n?-r? 

vertices is .— 
CHRI 2? 


byk- 1. 





+ (3) where r is the remainder of n when divided 


In particular, the maximum number of edges in a k-free graph with n 
vertices is at most t2? . n which is really the estimate that we will constantly 
use. We start with a series of rather direct applications of these two theorems. 
The other problems are however different in nature and more difficult. 


2 
1. Let z1,£2,...,£n be real numbers. Prove that there are at most — 


pairs (i, j) € {1,2,...,n}* such that i < j and 1 < |z; — z;| < 2. 
MOSP 2001 


236 Chapter 6. Some Classical Problems in Extremal Graph Theory 


Proof. Consider the graph whose vertices are 1,2,...,n. Connect two vertices 
i,j by an edge if (i, j) satisfies 1 < |x; — 2;| < 2. We claim that this graph 
contains no triangle. If we manage to prove this, the result follows from 
Turdn’s theorem. Suppose that the graph contains a triangle, thus we can 
find distinct a, b,c such that 


1 < |za — z| < 2, 1<|ry—2z-| <2, 1< |£e-— Tal < 2. 


By symmetry, we may assume that ta < Tẹ < ze. Then the previous in- 

equalities become xj — Ta > l, £e — Xp > 1, SO Te — Ta > 2, contradicting the 

inequality |r. — za| < 2. The result follows. O 
n2 

2. Prove that if n points lie on a unit circle, then at most 3 segments 


connecting them have length greater than v2. 
Poland 1997 


Proof. Consider the graph whose vertices are the n points and connect two 
vertices if their distance is greater than v2. The point is that this graph 
contains no 4-clique. This is clear, since a chord of length v2 subtends a 
central angle of 5. Thus, by Turan’s theorem the number of vertices is at 
2 ay: 

most $], finishing the proof. E 
3. There are 1999 people participating in an exhibition. Out of any 50 
people, at least two do not know each other. Prove that we can find at 

least 41 people who each know at most 1958 other people. 


Taiwan 1999 


Proof. This is a special case of the proof of Zarankiewicz’s lemma. For the 
reader’s convenience, let us recall the proof. Suppose that the conclusion does 
not hold, so we can find at least 1959 people, say Aı,..., A1959, each having 
at least 1959 friends. Start with A,. There is a person among Ag,..., Aig59 
that knows A,. Assuming that we found persons A;,,..., Aj, (i1 = 1) among 
Aj,...,A1g959, every two knowing each other, let us try to add a new person 
A to the group. But there are at least k- 1959 — (k — 1)- 1999 persons that 


tk+1 


237 


know A;,,...,Ai,- Thus, if k - 1959 — (k — 1) - 1999 > 1, we can always add 
one more person. If k < 48, then there are at least 79 people who know all 


of A;,,...,A;, and one of them is among Aj, Ao,..., A1995. If k = 49, then 
there are at least 39 people who know all of A;,,...,A;, and (although that 
person may not be among Aj, Á2,..., 41959) he completes a set of 50 all of 
whom know each other. L 


4. We are given 5n points in a plane and we connect some of them so that 
10n? +1 segments are drawn. We color these segments in 2 colors. Prove 
that we can find a monochromatic triangle. 


Proof. Note that the corresponding graph (with vertices the 5n points and 
edges between points connected by a segment) contains a 6-clique. Indeed, 
otherwise by Turan’s theorem it has at most (Gn) 4 = 10n? edges, which 
contradicts the hypothesis. Now, consider a 6-clique G’ of our graph G. The 
edges of G’ are colored in two colors and we claim that there is always a 
monochromatic triangle in G’. This is standard, but we recall the proof. 
Pick a vertex vı of G’. By the pigeonhole principle, we may assume that 
V1 V2, V13, V1Vv4 have the same color. If one of the edges vau3, veu4 or v3u4 
has the same color as vjv2, we are done. Otherwise, the triangle vov3v4 is 
monochromatic and we win again. LJ 





The following problem does not use Turán’s theorem, but it is still a very 
classical topic, namely Ramsey’s numbers. 


5. A group of people is called n-balanced if the following two conditions 
are Satisfied: 


a) among any three people, there are two who know each other; 


b) among any n people, there are at least two not knowing each other. 


Prove that there are always at most {n= Vint?) people in an n-balanced 


group. 


Dorel Mihet, Romanian TST 2008 


238 Chapter 6. Some Classical Problems in Extremal Graph Theory 


Proof. We will prove the result by induction. For n = 2, this is trivial, so 
assume that it holds for n — 1. Consider an n-balanced group and pick an 
arbitrary person P. Let A be the set of friends of P and let B be the set 
of all the other persons in the group. By hypothesis, any two persons in B 
are friends and B has at most n — 1 elements. By induction, A has at most 
(n=2 (nt) persons (by b) and the fact that P knows all elements of A, it 
follows that A is an n — 1-balanced group). Thus there are at most 


n—2)(n+1 n—1)(n+2 
m-n) yy CDa? 
2 2 
persons in the group, which is enough to prove the inductive step. L 


The following problem and the method of proof are absolute classics. 


k 
6. Prove that a graph with n vertices and k edges has at least 37 (4k — n’) 
n 


triangles. 


APMO 1989 


Proof. Let £1, £2,...,£n be the vertices of the graph G and let d(x) be the 
degree of x. Observe that for a given edge e = xiz; with endpoints zi, £4, 
there are at least d(x;) + d(z;) — n triangles containing e. Indeed, there are 
d(xi) + d(x;) — 2 edges having as endpoints one of z;,z; and another vertex 
among the n—2 remaining vertices, so there are at least d(z;)+d(z;)—2—(n—2) 
triangles containing z;,2; as vertices. Summing over all edges and taking into 
account that we count three times every triangle in this way, shows that the 
number of triangles is at least (we denote by E(G) the set of edges of G) 

5 DE (ali) + a(x) — 0) 


e=r;rjEE(G) 


The previous sum is also equal to 


1 n 
3 B d(x;)? — nk) . 


239 


Applying Cauchy-Schwarz and taking into account that 


3 d(x;) = 2k 
i=1 


yields the desired result. = 
The following problem is very similar to the previous one. 


7. A graph with n vertices and k edges has no triangles. Prove that we can 
choose a vertex such that the subgraph obtained by deleting this vertex 


4k 
and all its neighbors has at most k (1 — =z) edges. 


USAMO 1995 


Proof. Each edge zy gets killed when we remove x and its neighbors, when 
we remove y and its neighbors, or when we remove any common neighbor of 
x and y and its neighbors. Thus, this edge is killed a total of d(x) + d(y) 
times. Summing over all edges and taking into account that d(x) appears as 
a summand exactly d(x) times, we obtain that the average number of edges 
killed by.removing a vertex and its neighbors is DD d(x)?. Since 


`“ d(x) = 2k, 


we have > 
2k 
1 S (d(x)? > (2) , 
n — n 
Thus, on average we kill at least aki edges. The result follows. CI 


8. A graph G has n vertices and contains no complete subgraph with four 
. . 3 
vertices. Prove that G contains at most 5, triangles. 


Ivan Borsenco, Mathematical Reflections 


240 Chapter 6. Some Classical Problems in Extremal Graph Theory 


Proof. The result is easy for n = 2,3, 4, so let us assume that n > 4 and that 
the result holds for all k < n. We may assume that G contains a triangle ABC. 
Let Gı be the subgraph formed by the vertices of G different from A, B,C. 
(n=) 


(n- 3)’ 
27 


Since G has no 4-clique, it has at most (n= edges by Turan’s theorem. By 





the inductive hypothesis, G; has at most triangles. Any other triangle 
in G consists either of a vertex of G; and an edge of ABC or of an edge of 
G, and a vertex of ABC. Moreover, any edge of Gi forms a triangle with at 
most one vertex of ABC and any vertex of G; forms a triangle with at most 
one edge of ABC, because G contains no 4-clique. Thus G contains at most 
(n—3)% (n—8)? n3 
————  ——— + n-34+1=— 
27 + 3 + + 27 
establishing the inductive step. Note that the result is optimal if n is a multiple 
of 3, since we can consider the tripartite graph Kn/3,n/3,n/3- L 


It is a standard result that a graph with n > 4 vertices and more than 
npny ano vanes edges has a four-cycle (see, for instance [3], example 3, chapter 22). 
The following nice problem shows that this result is almost optimal. 


9. a) Let pbe a prime. Consider the graph whose vertices are the ordered 
pairs (x,y) with z, y € {0,1,...,p—1} and whose edges join vertices 
(x,y) and (zx’,y’) if and only if zz’ + yy’ = 1 (mod p). Prove that 
this graph does not contain a 4-cycle . 


b) Prove that for infinitely many n there is a graph Gn with n vertices 


nyn 


and at least — n edges that does not contain a 4-cycle. 


Hungary-Israel Competition 2001 


Proof. One can give a rather down-to-earth proof of a), based on explicit com- 
putations, but we prefer the following approach. Consider the F, = Z/pZ- 
vector space V = F*% and the standard inner product ((z,y),(z’,y’)) = 
rx’ + yy’. We are asked to prove that we cannot find four distinct vectors 
V1, V2,03,04 © V such that (vi, vi+1) = 1 for all 1 < i < 4 (here v5 = v1). 
If such vectors existed, vg and v4 would be both orthogonal to the nonzero 
vector vı — v3 and so they would be contained in a line of V. Thus, we can 


241 


find à € F, such that v4 = Ava. Then A = 1, as (v1, v4) = (v1, v2) = 1. We 
conclude that vg = v4, a contradiction. 

For the second part, take any prime p and set n = p*. The graph Gn in 
the first part of the problem has n vertices. Let us evaluate the number of 


edges. If v = (x,y) is a nonzero vector in V, then the equation (w,v) = 1 has 
p(p?—1) 





p solutions. Thus the number of edges is ““,— and it is immediate to check 
3 
that this is greater than nyn - n = È — p°. The result follows. o 


We continue with two very nice problems concerning graphs with n? + 1 
edges and 2n vertices. Note that this is the first case when Turan’s theorem 
ensures the existence of a triangle. The following results show that we can do 
better. 


10. Prove that a graph with 2n vertices and n? + 1 edges contains two tri- 
angles sharing a common edge. 


Chinese TST 1987 


Proof. We will prove the result by induction. For n = 2, this is trivial. Assume 
now that the result holds for n and consider a graph G with 2n + 2 vertices 
and n? + 2n + 2 edges. By Turdn’s theorem and the pigeonhole principle, 
we can find a triangle xyz such that d(x) = d(y) (mod 2). Consider the 2n 
points different from z,y. If there are at least n? + 1 edges among them, we 
are done by the inductive hypothesis. If not, then there are at least 2n + 2 
edges whose endpoints are x or y. But since d(x) = d(y) (mod 2), it follows 
that there are actually at least 2n + 2 edges whose endpoints are x or y and 
which are different from the edge ry. Let A ; be the set of vertices v Æ x,y 
that are connected to x and let Ag be the set of vertices v # x,y that are 
connected to y. Then the previous result implies that |A| + |A2| > 2n + 2. 
Since |A; U Ag| < 2n, it follows that |A; N A| > 2 and so we can find two 
distinct vertices z Æ t that are connected to both z, y. Then the triangles xyz 
and ryt share a common edge and we are done. E 


11. A graph has 2n vertices and n? + 1 edges. Prove that it contains at least 
n triangles. 


242 Chapter 6. Some Classical Problems in Extremal Graph Theory 


Proof. Again, the proof is by induction. For n = 2 this is trivial, so assume 
that the result holds for n and consider a graph with 2n + 2 vertices and 
n? + 2n + 2 edges. By Turán’s theorem we can find a triangle z)r273. Let 
A; be the set of vertices v # 21,2%2,273 that are connected to xi. ‘There are 
obviously at least 


S = |A N Aol + | A2 N A3| + | A3 N Aı| 


triangles different from 27,2273, since any element of A; N A; forms a triangle 
with the vertices x;xj. Thus, if we have S > n, we are done. Assume that 
S < n-— 1. Then, using the inclusion-exclusion principle, it follows that 


2n— 1> | A1 U Ag U A3| > |A| + |42| + |43|- n +1, 


thus one of the numbers |A| + |42|, |42| + |43|, |43| + |A| is smaller than 
2n — 1, say |Aı| + |A2ļ|. But this means that there are at least 


n? + 2n +2-—(2n+1)=n? +1 


edges among the vertices A3, A4, ..., A2n+2. Thus, by induction we find at 
least n triangles among these vertices and adding the triangle A; A243 yields 
at least n + 1 triangles. The inductive step is thus proved and the problem 
solved. E 


The following problems are more difficult. The next one concerns cover- 
ings of the edges of a graph by cliques. 


12. There are n aborigines on an island. Any two of them are either friends 
or enemies. One day they receive an order saying that all citizens should 
make and wear a necklace with zero or more stones so that 


i) for any pair of friends there exists a color such that each of the two 
persons has a stone of that color; 
ii) for any pair of enemies there does not exist such a color. 
What is the least number of colors of stones required (considering all 
possible relationships between the inhabitants of the island)? 
Belarus 2001 


243 


Proof. Let G be the graph whose vertices are the aborigines, two of them being 
connected if they are friends. Let B; be the set of aborigines who receive a 
bead of the i-th color. Then by condition ii) each B; forms a clique in G and 
by i) each edge is in one of these cliques. Conversely, if we have a collection 
of k cliques that cover every edge of G, then we can give a bead of color i to 
every aborigine in the 2-th clique and satisfy the conditions of the order. 

The previous paragraph shows that we need to find the smallest number 
f(n) of cliques that cover the edges of any graph on n vertices G. Clearly 
f(2) = 1 and f(3) = 2. To find an optimal graph, consider a complete 
bipartite graph on n vertices. The key point is that such a graph has no 
triangles, so if we want to cover its edges by cliques, we need at least E(G) 
cliques, where L(G) is the number of edges. ‘Therefore, we have a clear bound 
on f(n), namely f(n) > [2]. 

The hard part is to prove the opposite inequality. We will prove this by 
induction, going from n to n + 2 (which, combined with the first two values 
of f(n), will be enough to conclude). Consider a graph G with n+ 2 vertices. 
We may assume that at least two vertices are connected, say x,y. Consider 
the graph G1 obtained by deleting x,y and all edges adjacent to these two 
vertices. By induction, we know that we can cover its edges by f(n) cliques. 
If v is a vertex of G,, consider the subgraph spanned by z,y,v. By adding 
either a triangle or an edge, we can cover all edges of this subgraph incident 
to v. Finally, by adding one more edge to cover the edge ry, we covered all 
edges of G by using at most f(n) +n-+ 1 cliques (by the way, all the cliques 
used were only edges or triangles). Therefore 


f(n+2) < f(n)+n+1. 


Since 2 > 
(n +2) | _ n 
| 7 = 19 +n+1, 
it follows that f(n) < Bi Therefore the answer is Bi C 


It is not hard to guess the answer of the following problem, but proving 
that this is the correct answer is quite a pain in the neck. 


244 Chapter 6. Some Classical Problems in Extremal Graph Theory 


13. What is the least number of edges in a connected n-vertex graph such 
that any edge belongs to a triangle? 


Paul Erdés, AMM E 3255 


Proof. Let f(n) be the desired number and let g(n) = [327]. We will prove 
that f(n) = g(n) for all n. To save words, call good a connected graph in 
which every edge belongs to a triangle. 

We can easily establish that f(n) < g(n) by explicit constructions: if 
n = 2k+1, consider k triangles sharing a common vertex, this being the only 
common point of any two triangles. This is a good graph with 3k edges, so 
f(2k +1) < 3k = g(2k +1). If n = 2k, start from the above good graph with 
order 2k — 1, choose an edge AB and add a vertex X and edges AX, BX. We 
obtain a good graph with n vertices and g(n) edges. 

The hard point is proving that any good graph on n vertices has at least 
g(n) edges. We will prove this by strong induction. For n = 3,4, this is easily 
checked. The crucial observation that makes the induction work is that a good 
graph with n vertices and less than g(n) edges must have a vertex of degree 
2. This is trivial, since by connectedness all vertices have degree at least 1 
and since every edge is in a triangle, every vertex must have degree at least 2. 
However, the sum of the degrees of all vertices is twice the number of edges, 
thus smaller than 3n. 

Suppose now that f(7) = g(7) for all j < n and let us prove that f(n) > 
g(n). Suppose G is a connected graph with f(n) edges such that each edge 
belongs to a triangle and suppose f(n) < g(n). There is a vertex zx of G of 
degree 2, which by assumption must be in some triangle ryz. Suppose the edge 
yz is in a second triangle. Then the graph G’ obtained from G by removing z 
and the edges xy and xz is connected, has n— 1 vertices, every edge belongs to 
a triangle, and f(n) — 2 < g(n) — 2 < g(n — 1) edges. This is a contradiction. 
Thus yz is not contained in another triangle. Form a new graph G” by deleting 
the vertex x and collapsing the vertices y and z into a single new vertex w, 
where a vertex v of G” is joined to w if in G it was joined to either of y or z. 
Clearly G” is a connected graph with n — 2 vertices and every edge of G” is 
contained in a triangle. In forming G”, we lost three edges ry, yz, and rz. We 
would also lose an edge if some vertex v were adjacent to both y and z, but by 


245 


assumption this does not occur. Thus G” has f(n) — 3 < g(n) — 3 = g(n — 2) 
edges. Again this contradicts the induction hypothesis. ‘Thus we must have 


f(n) = g(n). O 


The following result is classical and very nice. We give two proofs, the 
first one being an explicit inductive construction, the second one using the 
powerful probabilistic method of Erdős. We refer the reader to the addendum 
6.B for more details on this versatile tool. 


14. Prove that for every n there is a graph with no triangles and whose 
chromatic number is at least n. 


Mycielski’s theorem 


Before starting the proof, let us recall some definitions. A k-proper coloring 
of a graph G is a coloring of its vertices with at most k colors such that each 
vertex receives one color and no two adjacent vertices have the same color. 
The chromatic number of a graph is the smallest number k for which G has a 
proper k-coloring. 


Proof. We will prove this by induction on n. For n = 1 everything is clear, 
so assume that we have a graph G with no triangles and chromatic number 
at least n and let us construct another graph with no triangles and chromatic 
number at least n+1. Let x(G) be the chromatic number of G. We may assume 
that x(G) = n (otherwise keep G for the inductive step). Let v1,...,u% be 
the vertices of G and consider the following new graph G’: the set of vertices 
consists of vj,..., Ux, together with vertices 71,...,2%,y (such that the 2k+1 
vertices thus obtained are distinct). Connect x; with all neighbors of v; and 
connect y with z1,...,z,%. Also, keep the edges in G. We claim that G’ is 
triangle free and x(G’) > n +1. It is clear by construction that G” has no 
triangles. Also, we easily have x(G’) < x(G) + 1, since any proper n-coloring 
of G extends to a proper n + 1-coloring of G’ (assign the same color to x; as 
to uv; and assign the color n + 1 to y). 

The difficult part is proving that this is actually an equality. Suppose 
thus that G’ has chromatic number at most n and take any proper n-coloring 
of G’. We will construct a proper n — 1-coloring of G, which will contradict 


246 Chapter 6. Some Classical Problems in Extremal Graph Theory 


the fact that G has chromatic number n. Assume, without loss of generality, 
that y has color n and let A be the set of vertices of G whose color is n and 
for each v; € A change its color with that of xi. We claim that this gives a 
proper n — 1-coloring of G. Note that no two vertices of A are adjacent, as 
all these vertices have the same color. Now, if v; € A is adjacent to vj ¢ A, 
then v; is adjacent to z;, so that they have different colors. Consequently, we 
found a n — 1-proper coloring of G, which is impossible. We deduce that the 
chromatic number of G” is at least n + 1. o 


Proof. We will actually prove a stronger result, which is a classical theorem of 
Erdős. We will use the probabilistic method, for which the reader is referred 
to the addendum 6.B. 


Theorem 6.2. For any positive integers k,n there exists a graph with chro- 
matic number greater than k and such that each cycle has length greater than 
n. 


Take N sufficiently large and consider random graphs with N vertices 
G N p, each edge appearing independently, with probability p = N c-l for some 
0 <c< +. If X is the number of cycles of length at most n in GN p (which 
cycles we will call short in the sequel), then clearly 


E[X] < SN -p < 
1=3 


cn 


N 
1— N-e 


since there are at most N* cycles of length i, each appearing with probability 
p’. Since the last quantity is smaller than a for sufficiently large N, it follows 
by Markov’s inequality that 


Let a(G) be the size of the largest independent set in G (the independence 
number of the graph) and let x(G) be the chromatic number of G. Then it 


is easy to see that x(G) > aC) if G has N vertices. Taking a = [5 In N], 


the probability that a(G) > a is at most (^) .(1— p) (2) (one can choose a 


247 


independent vertices in ($ ) ways and the probability that a given set of a 


vertices is independent is certainly (1 — p)(2)), An easy computation shows 
that the last quantity tends to 0 as N — oo. Thus for N sufficiently large 
we have P(a(G) > a) < 4 and so there is a graph Gp, = G with X < x 
and a(G) < a. Now, delete one vertex from each short cycle arbitrarily. The 
remaining graph Gi has at least N/2 vertices, no cycles of length at most n 


and independence number smaller than a. ‘Thus 


N/2 
> 
x(G1) 2 sync 7 


for sufficiently large N. The result follows. E 


k 


We end this chapter with three gems in extremal graph theory, all taken 
from mathematical contests. 


15. For a pair A = (z1,y1) and B = (z2,y2) of points on the coordinate 
plane, let 
d( A, B) = |x, — z2| + |y1 — yol- 
We call a pair (A, B) of (unordered) points harmonic if 1 < d(A, B) < 2. 
Determine the maximum number of harmonic pairs among 100 points 


in the plane. 
USA TST 2006 


Proof. Consider the graph with vertices on the 100 points and whose edges 
connect points of harmonic pairs. The key point is that this graph contains no 
5-clique. Indeed, if P;(x;, yi) are five vertices, any two of which are connected, 
then for all i # j we have 1 < d(P;, Pj) < 2. Order these points such that 
zı < z2 <--: < x5. It is easy to see that among the numbers yj, yo,...,Y5 
one can find three forming a monotonic sequence. Call these three numbers 


Yii s Vie, Yiz- Then 
d(Pi,, Pia) + d( Piz, Piz) = d(P;,, Pia). 


This is however impossible, since d( P; , Pia), d(Piz, Pi,),d( Piz, Pa) € (1, 2]. 
This finishes the proof of the fact that our graph contains no Ks. By Turán’s 


248 Chapter 6. Some Classical Problems in Extremal Graph Theory 


theorem, it has at most 3 . 100? 
3750 harmonic pairs. 

To finish the proof, it remains to exhibit a configuration having 3750 
harmonic pairs. It is enough to distribute the 100 points near the points 


(0, +3) and (+3, 0), having 25 points near each of these four points. oO 


= 3750 edges and so there are always at most 


16. For a finite graph G let f (G) be the number of triangles formed by the 
edges of G and let g(G) be the number of tetrahedra formed by the edges 
of G. Find the least constant c such that g(G)° < c- f(G)* for any finite 
graph G. 


IMO Shortlist 2004 


Proof. Let us begin by considering the case of a complete graph Kn. Then 


obviously 
(G) = (3) and o(G) = (“). 


o> g( Kn)? _ Ai 
~ FEA GY 


Thus 





and this happens for all n. Taking the limit as n — oo, we deduce that c > a 

The hard part is to prove that the value 3 actually holds for arbitrary 
graphs G. To do this, we will first consider the two dimensional version of the 
problem, which is comparing the number of triangles and edges in a graph. 
The reason is simple: computing tetrahedra in a graph G comes down to 
computing triangles in all subgraphs induced by the vertices of G and taking 
the sum of these numbers of triangles (and then dividing by 4, since each 
tetrahedra is counted four times). 

Therefore, consider first any graph G with n vertices 71,272,...,2n and 


e edges. Let us bound the number of triangles in terms of e. If d(z;) is 


the degree of the vertex z;, then there are at most (424) < das)” triangles 
containing the vertex z;. But the number of triangles containing the vertex 
x; is also obviously bounded by the number of edges of G, that is e. Thus for 
each vertex zi, there are at most d(z;) - V5 triangles containing the vertex 


249 


zi. Summing over i and taking into account that we count each triangle three 
times in this way, we obtain that 


1 e 2 e 
f (G) < 3 Ji Dates) = rejs. 


Note that the constant appearing in this estimate in optimal, by taking for 
G = Kn and then letting n > oo. This already shows that we are on the right 
path, since we solved the two dimensional version of the problem. 

To solve the original problem, all we have to do it to repeat some of the 
arguments in the previous paragraph. Namely, take a graph G with vertices 
Z1,...,2n and fix a vertex z;. The number of tetrahedra containing x; as 
a vertex is the number of triangles in the subgraph induced by the vertices 
connected to x;. Let e; be the number of edges in this subgraph, then the 
previous paragraph shows that there are at most Zei VE tetrahedra containing 
zi as vertex. Also, note that we have 3f (G) = 5 > e;, since for each vertex z; 
there are e; triangles containing x; as vertex (and we count each triangle three 
times this way). However, one still needs an observation to end the proof: we 
gave an estimate for the number of tetrahedra having x; as vertex in terms 
of e;, but there is also an obvious estimate: this number is at most f(G). 


Therefore, there are at most #/ Z f(G)e; tetrahedra containing z;. Summing 
over 7 and taking into account that we count each tetrahedron four times, we 


finally obtain 
4916) < Sse = 4/51) 3/10). 


Taking the cube of the last inequality yields the desired estimate 


PG) < 5-4) 


and ends the proof. o 


17. Let k be a positive integer. A graph whose vertex set is the set of positive 
integers does not contain any complete k x k bipartite subgraph. Prove 


200 Chapter 6. Some Classical Problems in Extremal Graph Theory 


that there are arbitrarily long arithmetic progression of positive integers 
such that no two elements of the progression are joined by an edge in 
this graph. 


KoMaL 


Proof. We will first prove the following result, of independent interest: 


Lemma 6.3. Let a > 2 and let Kaa be the complete bipartite graph with a 
vertices in each half of the partition. There is a constant c > 0 such that a 
graph with n vertices and without Kaa has at most cn? a edges. 


Proof. Let G be such a graph and call 1,2,...,n its vertices. Let d; be the 
degree of vertex 7. Let us count pairs (v, {0V1,...,Ua}), where {v1,...,Ua} is 
a set of a (distinct) vertices, all connected to v. For each v = i, there are 
precisely (%') sets {v1,...,Ua} sharing a pair with v. Thus we find X7, (Ë) 


a 
pairs in total. On the other hand, for each set of a vertices {v1,...,Ua} there 


are at most a — 1 vertices v sharing a pair with {v1,..., Ua}. Thus 
n 
d; 
D(a) e23) 
a a 
1=1 


Let q be the number of edges in the graph and let f(x) = (*) ifr>a-l 
and f(x) = 0 for 0 < x <a-—1. Then f is convex and since ` d; = 2q, we 


deduce that 
(a — »(") > nf (2) 
a n 


This yields the rough estimate 


2 a 
ant >n(“—a41) , 


from which the result follows immediately. o 


Coming back to the proof, fix an integer m > 1 and suppose that among 
any m vertices forming an arithmetic progression, at least two are connected. 


6.1. Notes 251 


Fix a large integer N and consider all arithmetic progressions with m terms, 





all among 1,2,...,N. There are at least 
N-1 N-1 
N-a N-a N(N — 1) 
——— -1 ] = ——— -N +1 
Daa (Got ) 2(m — 1) + 


such arithmetic progressions, since each is determined by a pair (a, d) of posi- 
tive integers such that a + (m — 1)d < N. This is greater than c(m)N? for N 
sufficiently large, where c(m) > 0 is a constant depending only on m. Since 
each of these progressions contributes an edge to the graph Gy (the subgraph 
of G induced by the vertices 1,2,..., N), and since each edge is counted at 


most mĉ times, we deduce that Gy has at least co) N 2 edges for all suffi- 


ciently large N. Since nN 2 > ¢N?-= for all sufficiently large N (and this 
for any constant c > 0), it follows by the previous lemma that Gy contains a 
Ky for all sufficiently large N. Since this contradicts the hypothesis on G, 


the conclusion follows. go 


6.1 Notes 


We would like to thank the following people for providing solutions to 
some of the problems presented in this chapter: Alon Amit (problem 12), 
Alexandru Chirvăsitu (problem 10), Xiangyi Huang (problem 3), Szymon Ku- 
bicius (problem 4), Joel Brewster Lewis (problems 2, 7, 15), Fedja Nazarov 
(problem 14), Richard Stong (problem 13), Gjergji Zaimi (problem 5). 


Addendum 6.A Some Pearls of Extremal 
Graph Theory 


The purpose of this addendum is to present some beautiful theorems 
from extremal graph theory, that nicely complement the more elementary 
results discussed in chapter 6. The main result is the famous Szemerédi- Trotter 
theorem, a deep result in incidence geometry which has a lot of wonderful 
geometric consequences. For instance, it is the basic tool in dealing with 
natural (but very hard) questions such as: if we are given n points in the plane, 
how many distinct distances do they always determine, what is the greatest 
number of segments of length 1 (or triangles of area 1) they determine, etc. 
Thanks to a brilliant observation of Elekes, it also plays an important role in 
additive combinatorics, giving nontrivial bounds on the so-called sum-product 
problem, a major research problem in modern combinatorics. 

The results discussed in this addendum found wide ranges of applications 
and their extensions are a very hot topic of research. See the papers [1], [11], 
[15], [22], [23], [86], [74], [72], [73], [75], [80], [81], and the excellent book 
[82] for more details, as we will only scratch the surface of the subject (but 
hopefully that will be enough to convince the reader of the beauty of these 
results). Also, some proofs in this addendum use the probabilistic method, so 
the reader is invited to take a look at addendum 6.B for details on this method 
and for probabilistic vocabulary. 


6.A.1 The Szemerédi-Trotter theorem 


The famous Szemerédi-Trotter theorem deals with the number of inter- 
sections between geometric objects, giving a fairly nontrivial (and even sharp 
up to absolute constants) upper bound for the number of incidences between 
a family of points and a family of curves. Since we do not want to delve into 
subtle topological considerations, let us agree that curves will always mean 
arcs of circles or polygonal lines, as this appears in all applications we have in 
mind. 


Theorem 6.A.1. (Szemerédi-Trotter) Consider n points P,, Po,..., Pn and 
m curves C1, C2,...,Cm in the plane. Suppose that Ci and C; have at most 


6.A. Some Pearls of Extremal Graph Theory 253 


one common point for alli # j. Then the number of pairs (1,7) such that 
P; € C; is at most m + 4max(n, (mn)3). 


In applications, the following immediate consequence is very handy: 


Corollary 6.A.2. Let S be a set of n points in the plane, L a collection of 
curves such that any two curves have at most one common point. Suppose 
that each curve in L contains at least k > 2 points of S. Then the number 
of pairs of a point in S and a curve in L containing that point is at most 


2 
29. max (n, 2 . 


Proof. Let | = |L| and let p be the number of such pairs. The theorem 
gives p < 1+ 4n3 max(n3, 13). By hypothesis, we have p > lk, therefore 
p-l>p(1- i) > $. Hence our inequality becomes p < 8n3 max(n3, (p/k)3). 
From this the result follows by considering two cases. 


It is important to note that corollary 6.A.2 (and also Szemerédi-Trotter’s 
theorem) is sharp up to a constant. Suppose that 2k? < n and consider the 
set of points (x,y), where 1 < x < k,l < y < § and z,y are integers. Also, 
consider the set of lines y = mz +b, where m, b are positive integers such that 
1<6< 3 and1<m< az. It is easy to check that any such line contains 
exactly k such points (namely all points (i, mi +b) with 1 <i < k). Moreover, 
there are about n points and about ny lines. Also, if k > yn, one can simply 
pick about n/k lines and put k points on each of them. 

The original proof [81] of theorem 6.A.1 was very intricate. In a beautiful 
paper [80], Székely gave an amazing proof using a graph-theoretic result on 
crossing numbers. We will follow this path, but in order to do that we first 
need some terminology. Let G be a graph and consider an injective map that 
sends vertices of G to points of the plane. Draw an arc (in the plane) between 
any two points that come from the endpoints of an edge of G, such that except 
for the endpoints this arc does not meet any vertex. Call such a map a drawing 
of G. A crossing (or crossing point) is simply the intersection of two arcs not 
at the image of a vertex. Such a point belongs therefore to at least two arcs, 
but it is not an image of a vertex of the graph (with a piece of paper and a 
pencil in front of you, all these definitions should be obvious!). The number 


204 Chapter 6. Some Classical Problems in Extremal Graph Theory 


of crossings is the total number of crossing points, counted with multiplicities 
(i.e. if a crossing point belongs to k arcs, its multiplicity is (5)). It is also the 
number of pairs of edges with no common endpoints and whose associated arcs 
intersect each other. After this dry list of obvious definitions, let us glorify 
one that will be very helpful in what follows: 


Definition 6.A.3. The crossing number of a graph G, denoted c(G), is the 
minimal number of crossings over all possible drawings of G. 


Note that, by definition, a graph is planar if and only if its crossing number 
is 0. The key ingredient in Székely’s proof of theorem 6.A.1 is the following 
deep theorem [1], that will be proved in the next section. 


Theorem 6.A.4. (Ajtai, Chudtal, Newborn, Szemerédi, Leighton) Let G be 
a simple graph with e edges and n vertices. If e > 4n, then c(G) > Er. 

Let us see why theorem 6.A.4 implies theorem 6.A.1. We may assume that 
every curve C; contains at least one of the points Pi, P2,..., Pn. Consider the 
graph G whose vertices are P;, P2,...,P, and whose edges connect adjacent 
points on some C; (i.e. two points P;,P, belonging to some C;, such that 
there is no other point P; between them on C;). Let e be the number of edges 
of G and let J be the number of incidences between points and curves (i.e. 
the number of pairs (i, j) such that P; € C;). As every curve contains at least 
one point among Pi, Po,..., Pn, we have e = I — m (since if a curve contains 
s points, it yields s — 1 edges of G and edges coming from two different curves 
are distinct). Now, since two curves intersect in at most one point, it follows 
that c(G) < m?. Ife < 4n, we deduce that I < m + 4n and we are done. 
Otherwise, the previous theorem yields m? > z$ 

2 
3 


I < m+ 4(nm)3. The result follows. 


2 
= , thus e < 4(nm)3 and so 


6.A.2 Proof of theorem 6.A.4 and a generalization 


The proof of theorem 6.A.4 is simply beautiful: we will start with a rather 
weak inequality obtained from Euler’s formula and, using the probabilistic 
method, we will improve it drastically by averaging. Let us start with the 
weak inequality: 


6.A. Some Pearls of Extremal Graph Theory 255 


Lemma 6.A.5. IfG is a graph with e edges and n vertices, then c(G) > e—3n. 


Proof. First, we reduce the proof to the case when G is planar. To do this, 
remove edges of G which cross other edges until this is no longer possible, so 
you end up with a graph G’ which is planar. Suppose that you removed k 
edges of G. Each of them removes at least one crossing, so that if we accept 
the truth of the lemma for G’, we can write 


c(G) > (G) + k > e(G’) +k- 38n =e — 3n. 


Now, assume that G is planar, so c(G) = 0. We need to prove that e < 3n. 
If n < 2, this is clear, so suppose that n > 3. Let f be the number of (possibly 
infinite) faces of G. By Euler’s formula we have n + f =e+ 2. As every face 
has at least three edges, we have 3f < 2e. We easily deduce that e < 3n — 6, 
finishing the proof of the lemma. 

Finally, let us recall the proof of Euler’s formula: if you start with a single 
vertex and try to reconstruct the graph, then every time you add an edge, you 
either add a vertex or a face, so the quantity n + f — e — 2 does not change. 
Since initially it is obviously zero, it is zero all the time. oO 


Now, we use the probabilistic method to improve the previous inequality. 
Take an arbitrary number p € (0, 1] and consider a random induced subgraph 
H of G, by picking each vertex of G independently and with probability p. As 
the probability of a given vertex to be in H is p, by linearity of expectation 
we have Ev(H)] = np, where v( H) is the number of vertices of H. Also, since 
an edge appears in H with probability p*, we have Ele(H)] = pe, where e 
(respectively e(H)) is the number of edges of G (respectively H). 

The previous lemma and linearity of expectation yield 


Elc(H)| > Ele(H)] — 3E[o(#)). 


We cannot easily express E|c()| in terms of c(G), but we can at least say that 
E[c(H)] < p*c(G). Indeed, take a drawing of G with exactly c(G) crossings 
and observe that the probability that a crossing survives in H is p*. The 
conclusion is that 

p'c(G) > Ele(H)] > p’e — 3np. 


206 Chapter 6. Some Classical Problems in Extremal Graph Theory 


As p was arbitrary, it is tempting to choose a value that will optimize the 
previous inequality. This value is p = 20 but unfortunately it is not necessarily 
smaller than 1. However, this is far from being a subtle issue: simply choose 
something a bit smaller, namely p = an This finishes the proof of the theorem. 

In geometric applications, it is useful to have a more flexible version of 
theorem 6.A.4, which allows graphs with multiple edges. This is the objective 
of the following theorem of Székely [80]. 


Theorem 6.A.6. Let G = (V, E) be a multigraph with n vertices and e edges. 
If the maximal multiplicity of the edges of G is m, then e < 32nm or c(G) > 
3 


e 
216 n2m ° 


Proof. The idea is to reduce everything to the case of simple graphs, where 
theorem 6.A.4 applies. The details are however a bit involved. For each 
1 < i < logam + 1, let G; = (V, E;) be the subgraph of G using only those 
edges with multiplicity between 2’-! and 2t (more precisely, two vertices a, b 
are joined by k edges in G; if they are joined by k edges in G and k € [2*~!, 2*)). 
Let e; be the number of edges in G;, without multiplicity. In order to apply 
theorem 6.A.4, we will restrict to those 2 for which e; > 4n. Let S be the set 
of these i. As G; has maximal multiplicity at most 2%, we have (under the 
assumption that e > 32mn, which will be made from now on) 





1+[logs(m)] e 
X |E(G )l=e-> |E(G Dl >e- `S 4n- 2 > e-— lnm 25: 
1€S ig S i=1 
We claim that for i € S we have 
At— 1 e3 
(Gi) 2 6an? 


Indeed, pick an arbitrary drawing of Gi and for every pair of vertices of G; with 
a multi-edge between them, select 2*7! of these edges and then arbitrarily and 
independently only one of them. We get a family of (2’~!)® simple graphs, 
each having e; > 4n edges (as i € S) and so with crossing number at least 
3 . 3 
siz. Thus, over this whole family of graphs we obtain at least 2li—1)ei Siy 


6.A. Some Pearls of Extremal Graph Theory 207 


crossings. However, one has to pay attention to the fact that each crossing is 
counted in 2(-1)(ei-2) members of the family, so in total we obtain at least 


o(i—1)e; e3 4’~1e3 


9(i-1)(ei-2) 64n2 «6 An? 


crossings in G;, proving the claim. 
Finally, by definition of the crossing number, it is clear that 


4?—1 e’ | 
c(G) > X c(Gi) > X (Gi) > `S a3 = aan Nole, 
i 1E€S 


1€S 1ES 





Now, since 


S Ze; > S_ |E(Gi)| > = 


1ES 1€S 


it is natural to use Holder’s inequality in the form 


2 3 
goeg 
1E€S 1ES 1ES 8 


Combining this with the easy estimate 


Qo 





l 1+[log2(m)] 
Soa < XO 2 < AV/2m 
iES i=1 
yields 
1 e’ e? 
(G) > p28: 32m enam 
finishing the proof. o 


6.A.3 An application to additive combinatorics 


In [23] Elekes made a nice connection between theorem 6.A.1 and a famous 
problem of Erdős and Szemerédi, the sum-product problem. This asks for a 


258 Chapter 6. Some Classical Problems in Extremal Graph Theory 


sharp lower bound of the expression max(|A + A|,|A- A|) over all sets of real 
numbers A with n elements. Here 


A+A={a+b)la,b€ A} and A. A= {a: dla,be A}. 


In particular, they made the very deep conjecture that for any € > 0 there 
exists ce > 0 such that for all sets A with sufficiently many elements we have 


max(|A + Al,|A- Al) > ce - |A|? =E. 


The following result treats the case when 2 — € = 2 In [72], Solymosi proves 


the case 2 — € = #. 


Theorem 6.A.7. (Elekes) For any finite set of real numbers A we have 
1 
A+ A|- |A- A] > 14/97 


Proof. Consider P = (A + A) x (A- A) as a set of points in R?. Let L be the 
set of lines lab with equations y = a(x — b) ranging over all choices of elements 
a,b of A. Note that all lines laa with a € A — {0} and b € A are distinct, so 
|L| > |A|(|A] — 1). Also, la» is incident with (b+ c,a-c) € P, for any c € A. 
Thus we have at least |A| - |L| incidences between L and P. Thus, by the 
Szemerédi-Trotter theorem, we have 


|A|- |L] < [Z| + 4max(|P|, (|P| -|Z1)*’*). 
If |P| > |L|?, then clearly 


|Al? 


P| > IAP -(|4)-1)? > 


and we are done. Otherwise, the previous inequality yields 


pi> (AY vite via WAR > Lya 


and the result follows. go 


6.A. Some Pearls of Extremal Graph Theory 209 


The variant of the sum-product problem in which instead of sets of real 
numbers one considers subsets of a finite field has generated a huge amount 
of work. The proofs are in general very technical, as the analogue of the 
Szemerédi-Trotter theorem over finite fields is wrong. One has nevertheless 
the following deep theorem [11]. 


Theorem 6.A.8. (Bourgain, Katz, Tao, Glibichuk, Konyagin) 
Lete > 0. There exist positive constants C,ô such that for any prime p 
and any A C Fp with |A| < p'~* we have 


max(|A + Aļ,| A- A|) > CJA! +. 
The analogue of the Erdős-Szemerédi conjecture for finite fields would be 
max(|A + A|,|A- Al) > c(e) min(|A|?-*, q'*) 


for subsets A of Fy. This is far from being settled. The following theorem can 
be deduced from the sum-product theorem. 


Theorem 6.A.9. (Bourgain, Katz, Tao) Let0 <a < 2. There exists € > 0 


3 
and C > 0 such that there are at most Cn2~* incidences between n < p° 
points and n < p° lines in Fp. 


Note that the previous theorem no longer holds if one considers subsets 
A of F}, where q is not a prime number. It suffices to consider a nontrivial 
subfield of Fy, for which |A + A| = |A| and |A- A| = |A|. Also, it is necessary 
to have some upper bound on |A], since otherwise A = Fp would be again a 
counterexample for p large enough. 


6.A.4 Some geometric applications 


In this section, we present some nice geometric applications of Szemerédi- 
Trotter’s theorem. The first result deals with the number of triangles of unit 
area spanned by n points in the plane. It is not difficult to construct examples 
in which there are at least cn? log n unit-area triangles, for an absolute constant 
c > 0, but it is rather hard to give nontrivial upper bounds for the number 
of such triangles. We will use an easy geometric argument combined with 
corollary 6.A.2 to give such an upper bound. 


260 Chapter 6. Some Classical Problems in Extremal Graph Theory 


Theorem 6.A.10. There exists an absolute constant c > 0 with the following 
property: the number of triangles of area 1 with vertices among n points in the 
plane is smaller than cn3. 


Proof. Let P be aset of n points in the plane. The essential observation is the 
following: if a,b are points of P, then all points p € P such that the triangle 
abp has area 1 are on the union of two fixed lines Lab, Li, ,, parallel to ab. This 
is immediate. Now, consider those pairs (a,b) for which the lines Lab and Lab 


1 . . . 
contain at most n3 points of P apiece. Each such pair determines at most 2n!/3 


unit-area triangles and so these pairs contribute at most 2n? - nis = Qn7/8 
unit-area triangles. On the other hand, by corollary 6.4.2, there are O(n‘4/) 
incidences between lines | containing at least n!/3 points of P and points of 
P. However, given such a line l, there are O(n) pairs (a,b) such that l = Lg» 
(as for any a and | there are at most two points b such that l = La»). Thus 
pairs (a,b) for which at least one of the lines Lap and La contains more than 
n1/3 points of P also determine O(n-n‘/*) = O(n?) unit-area triangles. The 
result follows. oO 


Using a lot more work, one can improve the previous result to O(n9/4+¢) 
for any € > 0 as shown in [22]. The exact order of growth of the number of unit- 
area triangles is unknown. If instead one considers the number of triangles 
with perimeter 1, the same kind of reasoning (but working with incidences 
between ellipses and points) yields a bound O(n!6/7). 

The second application is a famous result of Beck. It is again an easy 
consequence of corollary 6.A.2. 


Theorem 6.A.11. (Beck) For any n points in the plane, either more than 
n-2—'4 of them are on a line or these points determine at least 2~°°n? different 
lines. 


Proof. Let S be a set of n points in the plane. Call a line average if it contains 
between 2!4 and n/2" points of S. We claim that there are at most n?/2 pairs 
of points of S that determine average lines. Indeed, for each i € [14, logy siz] 


2\ .. . 
there are at most 2° - max (#, = | lines that contain between 2? and 2+! 


points of S, by corollary 6.A.2. Any such line contains at most 4’*! pairs of 


6.A. Some Pearls of Extremal Graph Theory 261 


points of S. Thus the number of pairs of points that lie on average lines is 
bounded by 


a> a max (5 =) < 2!! `S ny 5 2. n 
2?’ gt} — 2i 


14<i<log2 x14 14<1<log4 n log4 n<i<log2 yz 
2 2 2 2 
n . n n n 
11 2 
<2 ` = + ) Xn | < — + — = — 
2o 4 4 2’ 
i314“ i<logy sty 


proving the claim. 

So we have at least n?/2 pairs of points of S that do not determine an 
average line. Now, we have two possibilities: either some line contains more 
than n/2'4 points of S, in which case the result is proved, or actually the n?/2 
pairs of points determine only lines that contain less than 2!4 points of S. But 
then each such line contains at most 27° pairs of points of S, so there must be 
at least n?/2?9 such lines. So again the result is proved. O 


We continue with another rather natural question: what is the maximal 
number of unit-distances spanned by n points in the plane? Erdős proved 
the lower bound n}*t fests" and the upper bound cn?/? and conjectured that 
the true order of growth should be about n '+iggtogn, This is far from being 
proved, but the following theorem gives a better upper bound. In dimension 
3, Clarkson proved that the number of unit distances is O(n27**) for all e > 
0, while Erdos, Pach and Hickerson gave examples with at least cn4/3 unit 
distances. Also, note that already in dimension 4 one might have cn? unit 
distances, as shown by placing n points on zr? + r2 = 1/2 and n points on 
r3 + r? = 1/2. 

Theorem 6.A.12. (Spencer, Szemerédi, Trotter) Let S be a set of n points 


in the plane. Then there are at most 16n4/’ ordered pairs of points (P,Q) in 
S such that PQ = 1. 


Proof. Draw a unit circle around each point of S and consider the multigraph 
G with vertex set S and in which P,Q € S are joined if they are adjacent on 
one of these circles. Note that there might be more than one edge between two 


262 Chapter 6. Some Classical Problems in Extremal Graph Theory 


vertices and that G might have loops (if there are circles containing exactly 
one point of S). If q is the number of unit distances between points in S, 
then G has q edges, for if a circle centered at P € S contains x; points of 
S, these points contribute z; edges to G. Now, remove all circles containing 
at most two points of S, so we get a new multigraph with at least q — 2n 
edges. We claim that the maximal multiplicity in this multigraph is at most 
2. Indeed, if there are at least 3 edges between P Æ Q € S, then there are at 
least three circles containing P,Q, so the circles of radius 1 centered at P,Q 
have at least three common points, a contradiction. Next, for every pair of 
vertices with exactly two edges between them, remove one edge (arbitrarily), 
so that we end up with a simple graph having at least 1 — n edges. Now, if 
q > 10n, then this simple graph has more than max(ł,4n) edges and so at 
least (q/4)?/ (64n?) crossings by theorem 6.A.4. But since any two circles have 
at most two common points, there cannot be more than 2(5) < n? crossings. 
The conclusion is that if q > 10n, then q? < 4096n4 and the result follows (if 
q < 10n, everything is clear). o 


We end this addendum with a rather technical result concerning a famous 
question of Erdős: what is the least number of distinct distances determined 
by n points in the plane? Though rather innocent-looking, this is an extremely 
difficult problem. Erdős gave examples of configurations which determine at 
most VET distances, for some absolute constant c. For more than thirty 





years, the best result was Moser’s bound en?/3, which is a rather tricky, but 
very elementary argument. Note that this bound also follows from the previous 
theorem: there are at least nt ordered pairs of points and the previous theorem 
(combined with a rescaling argument) shows that each distance appears at 
most 16n4/3 times, thus the number of distinct distances is at least 25- ne . It was 
only in 1984 that Chung [15] improved Moser’s bound to cn>/". The next big 


step was done in the paper [16], where bounds of the form 2 are proved 
by rather difficult arguments. After a huge effort, a major breakthrough was 
made in [36], where Guth and Katz prove that n points always determine at 
least >, distinct distances. Here, we prove a much weaker estimate, but 


which is already highly nontrivial (following [80]). 


6.A. Some Pearls of Extremal Graph Theory 263 


Theorem 6.A.13. (Székely) Any set of n points in the plane determines at 
least cn*/> distinct distances, for an absolute constant c > 0. 


Proof. From now on c is an absolute constant that will change throughout the 
proof without changing its name. 

Fix n distinct points P}, P2,...,P, in the plane and let t be the number 
of distinct distances among them. Let dı, d2,...,d be these distances. We 
may assume that t < nio, as otherwise we are done. Around each point P; 
draw t circles, having radii d1, d2,...,d¢. Consider now the multigraph whose 
vertices are P|,...,P,, and whose edges are arcs connecting adjacent vertices 
on these circles. Note that there are at least n(n — 1) edges in this multigraph 
(for each P;, there are n — 1 points on the union of the circles centered at F; 
and if a circle contains s points, it contributes s edges). As usual, we remove 
edges that come from circles containing at most two points. It is easy to see 
that this removes at most 2nt edges, so we still have n? + o(n”) edges. 

Unfortunately, the maximal multiplicity might be too big for the previous 
theorems to yield the desired estimate. The crucial point is to show that there 
cannot be too many edges with very high multiplicity. This is the aim of the 
following lemma. Before stating it, call a pair of points k-rich if there are at 
least k edges between them. 


Lemma 6.A.14. There are at most = ctn” +ctnlogn edges joining k-rich pairs 
of points. 


Proof. By definition, this number of edges is at most the number of pairs 
(d,e), where d is the symmetry axis of an edge « e ol G such that d contains at 





vertices, by corollary 6. A. 2. ‘If such a line contains u points then, by definition 
of t, this line bisects at most 2tu edges. So, as long as we are counting pairs 
(d,e) such that d contains between k and 4,/n vertices, the total number of 


such pairs is at most 
2 
i+1 Cn 
>, he 


k/2<2i <4 yn 


264 Chapter 6. Some Classical Problems in Extremal Graph Theory 


and an easy computation shows that this is bounded by cn On the other 


hand, for each a > 4,/n, it is easy to see that there are at most ®© lines 
containing between a and 2a vertices. These lines yield at most 


cn o; 
` t2 < ctnlogn 
2/n<2?<n 


pairs. Combining these two observations finishes the proof. oO 


Finally, the previous lemma shows that there is an absolute constant cı 
such that if we delete all cı Vt-rich edges, the resulting multigraph G, has at 
least cn? edges. As clearly Gi has crossing number at most 2n7t?, it follows 


from theorem 6.A.6 that 
cn 


n2cı vt 


and the result follows. o 





Im t? > 


Addendum 6.B Probabilities in 
Combinatorics 


This addendum presents some applications of probabilities in combina- 
torics, or what is commonly called the probabilistic method. This is one of 
the most powerful tools in modern combinatorics, even though, just as with 
the pigeonhole principle, the underlying idea is extremely easy. The proba- 
bilistic method was used at first by Erdos, who proved a great deal of fairly 
nontrivial results using rather simple probabilistic arguments. For a much 
deeper study of the method, a canonical reference is the excellent book [2] by 
Alon and Spencer. Many of the examples we are going to discuss are taken 
from this book (however, some of them are left as exercises in the book and 
are not really easy... ). 

We begin by recalling some useful notions from probability theory. In 
discrete combinatorics, one works with finite probability spaces, which makes 
the discussion much more elementary, avoiding subtle issues from measure 
theory. A finite probability space is the data of a finite set Q (called the 
sample space) and of a map (called the probability distribution) P : Q — [0,1] 


such that 
NX Pw) =1. 
wED 


One obvious such map is the constant map ia which is called the uniform 
distribution. One defines for any subset A of Q, which we call an event, its 
probability as 
P(A) = X_ P(x). 
xEA 
It is very easy to check that P(AUB)+ P(ANB) = P(A)+ P(B) and that the 
probability of the complement of A is 1 — P(A). Given a probability space, a 
random variable X is simply a map X : Q > R. The expectation (or mean 
value) of X is 


E[X] = ` X(w)P(w) = Xor . P(X =2). 


wEQ xrER 


266 Chapter 6. Some Classical Problems in Extremal Graph Theory 


Note that the second sum is finite, as the image of X is finite. Another ex- 
tremely important notion is that of independence. Some events Aj, Ag,..., Ax 
are called independent if for all J C {1,2,...,k} we have 


(Nier Ai) = | | Pa: 


iE 


Two random variables X,Y are called independent if the events X = a and 
Y = b are independent for all a,b. It is an easy exercise to check that in this 
case E| XY] = E[X]E[Y]. 

The following theorem consists of two elementary, but incredibly powerful 
principles. The first one is basically the principle of the probabilistic method: 
if you want to show the existence of an object with properties P;,..., Pk, it 
is enough to prove that there is a probability space (Q, P) such that the sum 
of the probabilities that an object does not have property P; is smaller than 
1. Of course, the difficult point is constructing the good probability space 
(though in practice the construction is naturally imposed by the statement of 
the problem we are trying to solve) and estimating these probabilities. The 
second part of the theorem is called linearity of expectation and it is also a 
very powerful result, as we will soon see. 


Theorem 6.B.1. Let (Q, P) be a finite probability space. 
1) For any subsets A1, Ao,..., Ak of Q we have 


k 
Vier) < 2, Pl 
with equality if the events are pairwise disjoint. So, if yy P(A;) < 1, 
then UA; £ Q. 
2) If Xı, X2,..., Xk are random variables, then 


E(x, + Xog+---+ Xr = E[X,] + E[X9] treet E[ Xx]. 


Proof. 1) Let Bı = A; and let B; be the complement of U; Z! A; in A; for all 
i > 2. Then clearly U;B; = Uj A; and the B;’s are pairwise “disjoint. But then 


6.B. Probabilities in Combinatorics 267 


it follows that P(B;) < P(A;) (as B; C A; and P takes nonnegative values) 


and 
P(U; Aj) = =S P(B;) < X P(A 


The rest is immediate. 
2) is an easy consequence of the definition of expectation. o 


When using the probabilistic method, one is naturally confronted with 
estimating probabilities, which sometimes can be quite painful. A simple 
example is that if (Q, P) is a probability space and if X is a random variable 
on Q, then the definition gives that there exists w € Q such that X (w) > E[X]. 
The following two inequalities are also very basic, but useful tools in estimating 
probabilities: 


Theorem 6.B.2. 1) (Markov’s inequality) If X is a random variable tak- 
ing nonnegative values and if a > 0, then 


P(X >a)< BX]. 


2) (Chebyshev’s inequality) Let X be a random variable and leta > 0. Then 
1 
P(|X — E[X]| >a) < zz( (E[X*] — E[X]’). 


Proof. For the first part, simply note that 


E[X] = XL P(X =2)c >_> P(X =2)a=aP(X >a). 


r>a 


For the second part, using Markov’s inequality, we can write 
1 
P(|X — E[X]| > a) = P(X ~ E[X]}? > a?) < GEU(X - EIX)? 


and an easy computation using linearity of expectation yields the result. OU 


268 Chapter 6. Some Classical Problems in Extremal Graph Theory 


As usual, knowledge comes with practice, so we will spend the remainder 
of this chapter by giving a lot of applications of the previous theorems. We 
start with some applications of the sub-additivity of probabilities. One can 
prove the following inequality using standard techniques, but there is also a 
very elegant probabilistic proof: 


B.2. Let xij € [0,1] for 1 < i,j < n. Prove that 


ih: -Į a “Hf -J0 -2 >1 


Proof. Consider a random binary matrix (a;;) such that P(aj; = 1) = zij 
and all these events are independent. Then Ii- (1 — [ [4] vj) is simply the 
probability that each column contains at least a zero, while the expression 


i (4 - ID- - zij) ) is the probability that each row contains at least a 


l. Sinve all binary matrices have either at least one zero in every column or 
at least one one in every row, the result follows. o 


Another application of theorem 6.B.1 is the following nice problem: 


B.3. Let S be a finite set of points in a plane, no three of which are collinear. 
For each convex polygon P with vertices in S, let a(P) be the number 
of vertices of P! and let b(P) be the number of points of S lying outside 
P (i.e. outside its interior or border). Prove that for all real numbers z, 


So) (1 = 2) MP) 21, 
P 


where here the sum is taken over possibly degenerate convex poly- 
gons(polygons with 2, 1, or 0 vertices), too 


IMO Shortlist 2006 


‘With the convention that a(P) = 0,1,2 if P is @, a point of S, respectively a segment 
connecting two points of S. 


6.B. Probabilities in Combinatorics 269 


Proof. It is enough to check the equality for x € (0,1), since a polynomial 
vanishing on (0,1) vanishes everywhere. Consider a random two-coloring of 
S such that the probability that a point is black is x and all these events 
are independent. For a given P, the quantity r°? (1 — z)PlP ) is simply the 
probability that the vertices of P are black and the points outside P are white. 
As these events are disjoint, the sum over all P is the probability that there 
is a polygon with black vertices and such that all points outside it are white. 
But this probability is 1, since the convex hull of the black points is such a 
polygon (note that there might be 0,1 or 2 black points and this is why the 
sum is also taken over degenerate polygons). o 


We continue with two problems which use the principle of the probabilis- 
tic method, namely the fact that if the sum of probabilities of some events 
is (strictly) less than 1, then there is a point in the probability space not 
belonging to any of these events. | 


B.4. Prove that there is a four-coloring of the set M = {1,2,...,1987} 
such that M contains no monochromatic arithmetic progression with 
10 terms. 


IMO Shortlist 1987 


Proof. Pick a random coloring and observe that the probability that M con- 
tains a monochromatic progression of length at least 10 is bounded by N/4?, 
where N is the number of progressions of length 10 contained in M. So the 


problem is solved if we prove that N < 4’. But there are | eer | progressions 
of length 10 contained in M and whose first term is 7. So 
1987 
= 1 
<2 — . 20007. 
— 18 
So, it remains to check that 28 -56 < 9-219, which follows from 
56 = (53)? < (27)? = 21! .8 < 9- 211, o 


Recall that the m-th Ramsey number R(m) is the smallest positive integer 
n such that in any coloring of the edges of Kn (complete graph with n vertices) 


270 Chapter 6. Some Classical Problems in Extremal Graph Theory 


with two colors, one can find a monochromatic K,, subgraph. Note that it 
is not obvious that this is well defined, but one can prove without too much 
difficulty that any coloring of the edges of K (2m-2) with two colors contains a 


m—1 


monochromatic Km, so that R(m) is well-defined and at most (47-7) (which 
grows roughly as 4”). The following result of Erdős from 1947 is an absolute 
classic. Amazingly, even after more than 60 years, it remains close to the best 


known lower bound (and 4” remains close to the best known upper bound). 


B.5. If 2(2)-1 > ("), then R(m) > n. In particular, R(m) > 22 for all 
m > 4. 


Erdos 


Proof. By definition, saying that R(m) > n is the same as saying that we can 
find a coloring of the edges of Kn with no monochromatic Km. Consider a 
random coloring of the edges of Kn with two colors, each having probability 
1/2. If S is an m-element subset of the vertices of Kn, let As be the event that 
the corresponding subgraph of Kn is monochromatic. We want to prove that 
there is an element in the probability space (i.e. a 2-coloring) which belongs 
to none of the events As. It is enough to check that 


XO P(As) <1. 
S 


However, it is clear that we have (>) choices of sets S and for each of them 
P(Ag) = 2!-(2), 

Thus, as long as 2(2)-1 > ("), we can find the desired coloring. The second 

part of the theorem is now easy: taking n = [2/2], we have (for m > 4) 


(2) _ n(n) (n—m4t1) n” m? -1 


So m =m 
m = < Ty < (n/2) <22 < 2\2 O 


Here is a more delicate problem using similar ideas, but for which it is 
more difficult to find a good probabilistic interpretation. 


6.B. Probabilities in Combinatorics 271 


B.6. Let A1, Ag,...,An and B1, Bo,..., Bn be distinct subsets of N such that 
Ai N Bi = @ for all i and (A; N B;) U (A; N Bi) Æ @ for all i # j. Prove 
that for all p € [0,1] 


n 
Spal — pylBl < 1, 
1=1 


Tusza 


Proof. Let X be the union of all A; and B; and consider a random subset S 
of X such that the events z € S for x € X are independent and of probability 
p (more formally, let Q be the set of all subsets of X and define 


P(S) = pl\(1 — p)IA'SI, 


which is a probability measure on Q, because $` sc y P(S) = 1 by the binomial 
theorem). Consider the event E;: A; C S C X — Bi. By hypothesis, no two 
events occur at the same time, thus 


X P(E) = P(U;E;) <1. 


On the other hand, we have 


P(E;)= XO pl = p) IAB) = platy — p), 
A;CSCX —B; 


the last equality being an easy computation left to the reader. Inserting this 
in the previous inequality yields the desired result. o 


Remark 6.B.3. If all A; have a elements and all B; have b elements, we deduce 


' . e e +b a+b 
from Tusza’s inequality (by taking p = =4;) that n < io 


We continue with a series of applications of the linearity of expectation 
and Chebyshev’s inequality. First, a very simple problem: 


272 Chapter 6. Some Classical Problems in Extremal Graph Theory 


B.7. Let pn(k) be the number of permutations of {1,2,...,n} having exactly 
k fixed points. Prove that 


`S kpn(k) = nl. 
k=0 


IMO 1987 


Proof. Let o have the uniform distribution on the set of permutations of 
{1,2,...,n}. If X(c) is the number of fixed points of ø, then 


= X kpn(k) = E[X]. 
` k=0 


On the other hand, 


where X;(7) = 1g(j)=;- Then 


1 
E|X;|] = P(o(i) = i) = - 
n 
and the conclusion follows by linearity of expectation. oO 


The next two applications are absolute classics. 


B.8. Any graph with q edges contains a bipartite subgraph with at least q/2 
edges. 


Erdos 


Proof. Pick a random subset S of the set V of vertices, by including a vertex 
in S independently with probability 1/2 . Let X(S) be the number of edges ry 
for which exactly one of x,y is in S. For a given edge ry, the probability that 
exactly one of z,y is in S is 1/2, so by linearity of expectation E|X] = q/2. 
Thus one can find S such that X(S) > q/2. By construction, the subgraph 
comprising all edges with exactly one vertex in S is a solution. E 


6.B. Probabilities in Combinatorics 273 


For what follows, in a tournament there is exactly one match between each 
pair of players and there is no draw. A Hamiltonian path in a tournament is 
a permutation o of the players such that player o(z) beats player o(z + 1) for 
all 2. 


B.9. There exists a tournament with n players which Has at least sty Hamil- 
tonian paths. 


Szele 


Proof. Pick a random tournament (in which the results of each match occur 
with equal probability), and let X be the number of Hamiltonian paths. If ø is 
a permutation of the players, let X(T) be 1 if ø induces a Hamiltonian path 
in the tournament T and 0 otherwise. It is clear that, as random variables, we 
have X = J`, Xo. It is equally clear that E[X,] = 2'~” (since the result of 
the matches between o (i) and o(i+1) is imposed for all i). Thus E[X] = T 

and the result follows. O 


A good exercise for the reader is to prove that the following result im- 
plies (a weak form of) Turán’s famous theorem on graphs without n-complete 
subgraphs. 


B.10. Let dı, d2,...,dn be the degrees of the vertices of a graph. Prove that 
one can find a subset S of vertices such that 


elements. 


1) S has at least X7 


2) There are no edges between vertices in S. 


i= IA 


Proof. Consider a random permutation o of the vertices of our graph G, all 
permutations having equal probability 4 . Let Ai be the event: o(i) < o(j) 
for any neighbor j of i. We claim that P(A, ) = IFI L. Indeed, we need to find 
the number of permutations o of the vertices such that a(t) < o(j) for any 
neighbor j of i. If yi1,..., Ya; are the neighbors of 2, there are ( d; T) possibilities 
for the set {a(t),a(y1),---,o(Yya;)}, di! ways to permute the elements of this 


274 Chapter 6. Some Classical Problems in Extremal Graph Theory 


set (and not (d;+1)!, since o(z) is the smallest element of the set) and finally 
(n — d; — 1)! ways to permute the remaining vertices. So 

n (n — di — 1)!d;! _ 1 
d;+1 n! ds 41? 





P(Ai) = ( 


as claimed. If X is the random variable X (øo) = )oi_, leea;, then 


E[X]=)_ P(A) = >- — 
i=l i=1 ” 


Hence one can find o such that 





A 1 
> grr 


It is clear that the set of vertices 2 such that ø € A; satisfies both properties. 
E 


We continue with a beautiful proof, due to Lubell, of a famous theorem 
of Sperner on maximal anti-chains. 


B.11. Let F be a family of subsets of {1,2,...,n} which does not contain two 
elements A, B such that A C B. Then |F| < (2J): 
2 


Sperner’s theorem 


Proof. Consider a random permutation o of {1,2,...,n} (with uniform dis- 
tribution) and let A; be the event that {o(1),o(2),...,o(i)} € F. The hy- 
pothesis on F implies that the random variable X (o) = Soi, loca, satisfies 
X(a) < 1 for all ø, so that its expectation E|X] < 1. On the other hand, 
E[X] = `; P(Ai) and the probability that {o(1),...,0(z)} is a given set 


ina vi 


with 2 elements is , thus P(A;) = ry where n; is the number of subsets 


with 2 elements in F. . Putting this together yields the beautiful inequality 


DE ES 


i=1 \t 


6.B. Probabilities in Combinatorics 275 


from which the result follows, as (zJ) is the largest among the binomial coef- 
2 
ficients. E 


Actually, the proof shows the following more general result, known as 
the Lubell-Yamamoto-Meshalkin-inequality: let A,,A9,...,A, be subsets of 
{1,2,...,n} such that no A; is a subset of any A;. Then 


1 
foe poo <1, 


(Gad 


The following famous theorem of Bollobás generalizes this result further: 


—_— 
= 








B.12. Let A1, Ao,..., An and Bı, B2,..., Bn be distinct sets of positive integers 
such that 4; N B; = @ for all i, but A; N B; £ 9 for all: A j. Then 


L 1 
yaar <b 


Bollobas 


Proof. We may assume that all A;’s and B;’s are subsets of {1,2,..., N}. 
Consider the uniform distribution on the set Sy of all permutations of 
{1,2,...,N}. Let E; be the event consisting of all permutations ø with the 
following property: all elements of A; come before all elements of B; in the 
list o(1),0(2),...,a0(n). It is easy to see that Py hypothesis implies that the 
events E; are pairwise disjoint. Thus >), P(E;) < 1. It remains to notice 
that P(E;) = TATE as any subset with Iy elements of A; U B; is equally 


|A;l 
likely to form the first |A;| elements in the list o(1), o(2),...,o(n). The result 
follows. g 


We take a break here to discuss a very nice consequence of Bollobás’ 
inequality. Recall that an r-uniform hypergraph is a pair (V, E), where V isa 
set and È is a collection of subsets of V, each subset having r elements. Let 
K; be the complete r-uniform hypergraph on t vertices. 


276 Chapter 6. Some Classical Problems in Extremal Graph Theory 


B.13. Let G = (V, E) be a r-uniform hypergraph on n vertices. Suppose that 
G contains no Kj, but that if we add any r-element set to E, at least 
one K; appears. Then 


E| > (") E aren) 
r r 
Bollobás 


Proof. Let |n] = {1,2,...,n} and let [n]'") be the collection of r-element 
subsets of |n]. If A € [n]'") — E, consider a copy C(A) of K? that appears 
by adding A to E. Then A € C(A), yet A’ is not in C(A) if A’ # A is in 
[n] — E. So, if {Ay,..., Am} = [n]) — E and B; is the complement of C(A;), 
then (A;); and (B;); satisfy the hypotheses of Bollobds’ inequality, yielding 
m< (rt), The result follows. o 


We continue with a very nice result from additive number theory. 


B.14. Let A C Z/n?Z be a subset with n elements. Prove that there exists a 
subset B C Z/n?Z with n elements such that |A + B| > n, 


IMO Shortlist 1999 


Proof. Pick a random collection of n elements of Z/n?Z, each of the n elements 
being taken with probability 1/n? and all choices being independent. Consol- 
idate the distinct elements among the n chosen ones in a set B, which may 
have less than n elements. Consider then the random variable X = |A + B|. 


As 


X = `S Lic A+B; 
iEZ/n?Z 


we have by linearity of expectation 


E[X]}= X` Pie A+B). 
iEZ/n?Z 


6.B. Probabilities in Combinatorics 277 


On the other hand, the probability that i € A + B is clearly the n-th power 
of the probability that a given integer is not in A, that is 


paean =1- (1-64) =1- (1-2). 


We deduce that Nn 
E[X] =n? (1- (1-3) ) 
n 


and the result follows from the inequality (1 — 1)” < ż. E 


Often, the choices of the probability space and distribution are imposed 
by common sense. However, in order to get better quantitative results, one 
uses sometimes less natural probability distributions. The following problems 
illustrate this: 


B.15. Let V be a set with n elements and let F be a family of m subsets of 
V, each having three elements. Prove that if 3m > n, then there exists 
S C V having at least a SE elements and such that no element of F 
is contained in S. 


Proof. Choose p € [0,1] and pick a random subset S of V such that P(v € 
S) = p for all v € V, all the events being independent. If n(S) is the number 
of elements of F contained in S, then linearity of expectation yields 


E||S| — n(S)] = E[ISI] - E[n(S)] = np — mp’. 


This is maximal for p = ,/= € [0,1] and is equal to an 3m: Thus, we can 
find S such that |S|— (S$) > #,/. Choose such s` For each e € F with 
all of its three elements in S, delete arbitrarily one of these three elements. 
We thus end up with at least |S] — n(S) elements of V which obviously have 


the desired property. E 


B.16. The minimal degree of the vertices of a graph G with n vertices is d > 1. 
Prove that there exists a subset S of the vertices such that 


1+In(d+1) 


278 Chapter 6. Some Classical Problems in Extremal Graph Theory 


2) Any vertex of G is either in S or neighbor of a vertex in S. 


Noga Alon 


Proof. Consider a random subset S of the set of vertices V such that each 
vertex is in S independently with probability p = nite) If S’ is the set of 
vertices which do not belong to S and which have no neighbor in S, then SUS’ 


satisfies the second condition. It is thus enough to show that 


14+In(d +1) 


/ 
< 
ESUS || <n | 


However, 
E\|SUS'"|] < Ells] + E[S] 
=) Pv EeS)+ 5 Plv E S’) 


< pn+n(1—p)**? 
1+ In(d +1) 


< TT 


the last inequality being equivalent (after dividing by n, canceling p and taking 
logarithms) to log(1 — p) < —p, which is well-known. o 


A very beautiful but rather subtle application of the probabilistic method 
is the following gem due to Erdos: 


B.17. Let A be a finite set of nonzero integers. Then A contains a subset B 
of cardinality greater than lal such that the equation z + y = z has no 
solutions in B x B x B. 


Erdos 


Proof. The proof is quite tricky. Choose a prime p = 3k + 2 large enough, say 
such that p > 2|a| for alla € A. The subset S = {k +1,k4+2,...,2k +1} 
of F, is a sum-free subset, i.e. the equation z + y = z has no solutions with 


6.B. Probabilities in Combinatorics 279 


x,y,z E S. Pick x € Fp randomly with uniform probability and consider the 
set 

A, = {a € Alar (mod p)€ S}. 
This is easily seen to be a sum-free subset of A. To prove that one can choose 
x such that |A,| > Al it is enough to compute the expected value of |A,|. 
Linearity of expectation shows that 


El|Acl] = $ P(a (mod p) € 8) 


acA 
and it is clear that for any a € A we have P(az (mod p) € S) = EL, as 
(az)ceFs is a permutation of Fp. Thus E||Az|] > 7 LAI and we are done. UO 


We end this chapter with a very beautiful application of Chebyshev’s 
inequality. Before attacking this problem, let us recall a few basic properties 
of the variance. Note that for any constant c and any random variable X we 
have Var(cX) = c*Var(X). Also, it is not difficult to check that if X,Y are 
independent variables, then Var(X + Y) = Var(X) + Var(Y). Finally, if X is 
0 with probability p and 1 with probability 1 — p, then Var(X) = p(1 — p). 


B.18. Let v1, v2,...,Un ben vectors in the plane, whose coordinates are integers 


of absolute value less that 001 / 2". Prove that there are disjoint subsets 
I,J of {1,2,...,n} such that 


Sra = Woy. 


iE! jEJ 


Proof. Note first of all that it is enough to find two distinct such subsets J, J, 
for then it is enough to take out of each the common elements of J, J. Now, 
assume that we cannot find two such distinct subsets and write v; = (zi, yi). 
Consider a random binary sequence (a1, @2,...,@n), each a; being 0 or 1 with 
probability 1/2 (all these events being independent). Consider the random 
variables X = Si, aja; and Y = >i, ay. The point is that the values 
taken by (X,Y) are all different and that a lot of values are concentrated near 


280 Chapter 6. Some Classical Problems in Extremal Graph Theory 


the expectation of (X,Y). To be more precise, let mz; = E|X], my = EY] 
and observe that the discussion preceding the problem yields 


Var(X) = Var © aizi) = DE 4 < oo 


by hypothesis. Let o? = Teo: Using Chebyshev’s inequality, we obtain 


P(|X — mz| > 20) < P(|Y —m,| > 20) < 


1 1 

4’ 4’ 

hence 1 
P(|X — mz| < 20,|Y — my| < 20) > z 

This means that at least half of the (pairwise distinct) points (X,Y) are located 

in the square |X — m,| < 20, |Y —m,| < 20o. But this square contains at 

most (1 + 4g)? lattice points, so 


(1+ 40)? > 2"1, 


which is easily seen to be impossible (as say 250% > (1+ 40)? for n > 16) 
unless n = 1, in which case we are easily done. oO 


Chapter 7 


Complex Combinatorics 


There is a class of combinatorial problems, most of the times with number- 
theoretic flavor, which have very elegant solutions using finite Fourier trans- 
forms or versions of it. The main observation is the identity 


= 
—_ 


zina) 


x| = 


ln=a (mod k) — 


Il 
© 


j 
which holds for any primitive root of unity z of order k. This gives a rather 
powerful approach to problems concerning the distribution mod k of the sums 
of subsets of a given set or to tiling problems. Actually, this relation is a 
very special case of a much broader theory, that of representations of finite 
groups. The case of finite abelian groups is particularly easy and useful in 
combinatorial problems and we refer the reader to addendum 7.A for a more 
detailed discussion. For the next problems the previous relation is actually 
sufficient. 


7.1 Tiling and coloring problems 


We start with two tiling problems which have rather elegant solutions 
using complex numbers. Of course, the idea is similar to the usual coloring 
method, but we think it is neater. 


282 Chapter 7. Complex Combinatorics 


1. Let k be an integer greater than 2. For which odd positive integers n 
can we tile a n x n table by 1 x k or k x 1 rectangles such that only the 
central unit square is uncovered? 


Gabriel Dospinescu 


Proof. Assign coordinates to the squares of the table, with (0,0) assigned to 
the upper left corner. Put the number z**¥ in square (x,y), where z is a 
primitive k-th root of unity. Then all 1 x k and k x 1 tiles will cover a total 
value of 0. Thus the total value of the whole board must be 0. On the other 


hand, this total value is precisely (S 2) — z”71, Thus we must have 





J 2 n n—1 . 
(So z) — z2”7l = (so for some £ € {—1,1} we have — =e-z 2. This 


can be also written as (2°2 — e(z "F +€) = 0, which immediately implies 
that n is congruent to 1 or —1 (mod k). The converse is easily seen to hold, 
for if n = 1 (mod k) then we can tile both the center column and the center 
row (excluding the center square) using the rectangles, leaving four rectangles 
of dimensions multiples of k. And if n = —1 (mod k), then we can split the 
table into four rectangles each with a dimension a multiple of k and so again 
the tiling is possible. Therefore the answer to the problem is: all n for which 
n = +1 (mod k). oO 


2. Let n > 2 be an integer. At each point (i,j) having integer coordinates 
we write the number i + j (mod n). Find all pairs (a,b) of positive 
integers such that any residue modulo n appears the same number of 
times on the sides (except for the vertices) of the rectangle with vertices 
(0,0), (a,0), (a,b), (0,b) and also any residue modulo n appears the 
same number of times in the interior of this rectangle. 


Bulgaria 2001 


Proof. First, let us get rid of the easy case n = 2. As the interior of the 
rectangle must have an even number of lattice points, clearly one of a,b must 
be odd. But if one of a,b is odd, then clearly all conditions are satisfied, so 
that in this case the solutions are all pairs (a,b) with a,b not simultaneously 
even. 


7.1. Tiling and coloring problems 283 


Now, let us treat the more difficult case n > 2. Let z be a primitive root of 
unity of order n and put the number zt in the square (1,7). By hypothesis, 
the sum of the numbers associated to the squares of the rectangle is 0, so that 
Se] D zîti — 0. But this factors as 


so that necessarily one of a,b is congruent to 1 modulo n. By symmetry, we 
may assume that a = 1 (mod n). Now, since every residue modulo n appears 
the same number of times on the sides of the rectangle, we also have 


b—1 a-l 
(2° + 1) So 2 + (2? +1) (5#) = 0. 


j=l i=1 
As a=1 (mod n), the previous equality becomes (z+ 1) (£z 2 = 0. But 
n > 2, so that z+1 4 0 and so we must have a zi = 0 and b = 1 (mod n). 
Since it is clear that for a = b = 1 (mod n) all conditions are satisfied, it 
follows that this is the solution if n > 2. E 


The next two problems have very beautiful solutions using the methods 
of this chapter and some classical algebraic identities. 


3. Each element of M = {1,2,..., n} is colored in either red, blue or yellow. 
Let A be the set of triples (x,y,z) € M x M x M such that n divides 
r +y +z and z,y,z have the same color. Let B be the set of triples 
(x,y,z) € M x M x M such that n divides r+ y + z and z,y,z have 
pairwise different colors. Prove that 2| A| > |B]. 


Chinese TST 2010 


Proof. For simplicity write 1,2,3 for the colors and let 


f(X)= X Xs, 


1€M,c(i)=j 


284 Chapter 7. Complex Combinatorics 


where c(i) = j if i has color j. The identity 


1 k-1 . 
ln=a (mod k) — k `S zi(n—a) 
j=0 
yields 
3 1 n=l 1 n-1 3 
| A| =X `S = Y e R (tyta) — DE file) 
j=l e(z)=c(y)=c(z)=j  k=0 M10 j=l 

and 

Ra ar) 

|B| = 59 [[ he) 
Mie 0j=1 


Of course, this method has some limitations, because it is rather difficult 
to compare complex numbers. The miracle is in the formula 


1 
a +y” + 2° — Bayz = s(n +y + z)((£ - y)? + (y - 2)? +(2-2)”) 


and especially in the fact that 


n 
a) 2inmkj 
3 f;(e = e n = 


j=l 
unless k = 0, when it scuals n. 
Putting these observations together, we deduce that 


2A] — |B] = (AC) — fo(1))? + (0) - fa(1))? + (A) -B020 0 
The following problem is very similar, but more technically involved. 


4. Color the numbers 1,2,..., N using 3 colors such that there are at most 

A numbers of each color. Let A be the set of 4-tuples (a,b,c,d) € 
{1,2,..., N}4 such that a+b+c+d=0 (mod N) and a,b,c,d have the 
same color. Let B be the set of 4-tuples (a,b,c,d) € {1,2,...,N}* such 
that a+b+c+d=0 (mod N), a,b and c,d have the same color, but 


these colors are distinct. Prove that |A| < |B|. 
KoMaL 


7.2. Counting problems 285 


Proof. Let us write z, = e N° and c(j) for the color of number j. Define 
fj(X) = D(a) = X*, As in the previous solution, one ends up with the 
formula 


N- 
BI = Al = È SE (ofa ea) fen)? + Balen)” fala)? + Aa) 
k=0 


— fi(ze)* — folze)* — fa(zx)*). 
Readers familiar with Heron’s formula have already noticed that 
2(a7b? + b?c? + ca”) — (a* +b +c’) 
=(a+b+c)(b+c—a)(c+a-—b)(a+b-c). 
Also, as in the previous solution we have 


N 


fi(ze) + folk) + fa(ze) = Se 7" 


a=1 


and this is nonzero if and only if k = 0, in which case it equals N. The result 
follows from these remarks and the hypothesis, which ensures that 


fi(1) + f2(1) = fs(1) 


and similar inequalities. o 


7.2 Counting problems 


In this section we combine the technique of generating functions with the 
fundamental relation 


ln=a (mod k) — ee a) 


This mixture is very powerful and applies very well for counting problems with 
arithmetical flavor, as roots of unity can detect congruences very efficiently. 


286 Chapter 7. Complex Combinatorics 


5. Three persons A, B,C play the following game: a subset with k elements 
of the set {1,2,..., 1986} is selected randomly, all selections having the 
same probability. The winner is A, B, or C, according to whether the 
sum of the elements of the selected subset is congruent to 0, 1, or 2 
modulo 3. Find all values of k for which A, B,C have equal chances of 
winning. 


IMO 1987 Shortlist 


Proof. Let z be a primitive third root of unity and consider the polynomial 


1986 
P(X) = (1+ Xz)(L+ X2?) (1 + X21) = X aX! 
j=0 


Note that 
P(X)= X em XN, 
IC{1,2,...,1986} 


where m(I) is the sum of elements of /. Thus 


aj = X MM = `S 1+ S ljzt+ `S 1 | 2? 


|I|=3 |Z|=3,3|m(1) [J]=k,3|m(1)-1 [1]=k,3|m(1)—2 


and this is zero if and only if 


l= SY 1= Yoon 


[Z|=3,3|m() [1]=7,3|m(1)-1 [J]=3,3|m(I)—2 


Therefore, the problem asks precisely for those k such that a, = 0. Now, 
fortunately the polynomial P has a very simple form, since 


P(X) = (1 + Xz)(1+ Xz)(1 + X27))®? = (1 + X3)%?, 


so the nonzero coefficients are precisely those whose index is a multiple of 3. 
Thus the answer to the problem is: those k = 1,2 (mod 3). oO 


7.2. Counting problems 287 


We present two solutions for the following problem: the first one is the 
standard proof using complex numbers, while the second one is a very elegant 
probabilistic proof. 


6. We roll a regular die n times. What is the probability that the sum of 
the numbers shown is a multiple of 5? 


IMC 1999 


Proof. We need to count the number of sequences (21, £2,...,£n) with 1 < 
x; < 6 and such that 5 divides zı + £2 +--+ £n. Let a; be the number of 
sequences (11,22,..-,2%n) such that 1 < z; < 6 and 71+ %9+°::+2n =] 
(mod 5). Then for each fifth root of unity z we have 


ao + az + ee + aaz’ — X gT1tHT2+ = +Tn — (z + z2 + tee + z$)”, 
1<T1,..., 2n <6 


This equals z” unless z = 1, in which case it equals 6”. Adding these relations 
for all fifth roots of unity z gives 5aọ = 6” + z” + 2?” + z3” + z4”, for some 
primitive root of unity of order 5. If n is a multiple of 5, z” +--+ 24% = 4 
and the probability is 22 = £4. If not, then 








67 — 5-6" 
yn _ gon 
gy pee. 4M = A 1 
l-z” 
so that the probability is c=. o 


Proof. We use a simple fact from probability. Suppose X is an integer-valued 
random variable which is uniformly distributed mod m (that is, takes on every 
value mod m with probability 1/m) and suppose Y is any other integer-valued 
random variable independent of X. Then X +Y is also uniformly distributed 
mod m. 

To use this we break a roll of a die into two steps. We first flip a coin 
with a 5/6 probability of giving a heads. If we get a heads, then we roll a fair 
5-sided die with the numbers 1, 2,3,4,5 on it. If we get a tails, then we just get 
the number 6. If any one of our n coin flips gives a heads (which occurs with 
probability 1— ()”), then one of our summands is uniformly distributed mod 


288 Chapter 7. Complex Combinatorics 


5 and hence the sum is uniformly distributed mod 5. Otherwise, all the coin 
flips were tails and hence all the dice were 6’s, so that the total is n (mod 5). 
Thus the probability of getting a sum which is a multiple of 5 is 4 (1 — (4)") 
for n not a multiple of 5 and ł (1 — (4)”) + (3)" for n a multiple of 5. o 


The following problem is a good opportunity to recall a very useful result: 


if p is a prime and if ag,aj,...,@p-—1 are rational numbers such that ap + 
aiz +: + ap-12?7! = 0 for some primitive root z of order p of unity, then 
ag = a = ++: = ap-1- This is an immediate consequence of the irreducibility 


of 1 + X +--+ XP! over the rational numbers. 


7. Let p > 2 be a prime. How many subsets of {1,2,...,p— 1} have the 
sum of their elements divisible by p? 


Ivan Landjev, Bulgaria TST 2006 
Proof. Let z be a primitive root of order p of unity and consider the sum 


S = `S zm™A) 


AC{1,2,...,p—1} 


where m(A) = aca a- If xj is the number of subsets A C {1,2,...,p — 1} 
such that m(A) = j (mod p), then clearly 


S = £o + £12 + z222 +--+ Lp—2?—'. 


On the other hand, we can explicitly compute S, since 


p-1 
S=]fa+2)=]]Ja+9), 
i=l Ç 


the product being taken over all roots ¢ of the polynomial 
XP -1 
X -l1 





We deduce that 


l[o +¢)= a =1 


7.2. Counting problems 289 


So zg —L+ajz+---+ Lp-12z?} = 0, which implies that r9 — 1 = t1 =--- = 
Tp-1 = k for some k. Since rp + xı +++: + Zp_-1. is simply the number of 
subsets of {1,2,...,p—1}, that is 2P-1 we deduce that kp + 1 = 2?! and so 
k = 2-1 Since ro = 1+ k, the problem is solved. Note that we included 
the empty set when counting xo (by convention the sum of the elements of the 
empty set is zero). go 


The following problem is technically more involved, but uses the same 
ideas as before. The special case when n is a prime was problem 6 of IMO 
1996. 


8. Prove that the number of subsets with n elements of {1,2,...,2n} whose 
sum is a multiple of n is 


X Se (3) 


d|n 





Proof. Consider the polynomial 
F(X,Y) = (1+ XY)(1 + X*Y)---(14 XY). 


If m(A) is the sum of the elements of A, we also have 


2 
xn E| xm | ve, 
a=0 \|Al|=a 


where all sets A in this solution are subsets of {1,2,...,2n}. Taking for X 


the roots of unity of order n, say zj = e n , the fundamental relation implies 
that 


HAY) +f f(n, Y -YY ’ 


where Ta is the number of subsets A with a elements and such that m(A) = 0 
(mod n). We deduce that £n, which is the number of sets A we are trying to 


290 Chapter 7. Complex Combinatorics 


find, is also the coefficient of Y” in the polynomial +(f(z1,Y)+---+f (zn, Y)). 
Now, fix j and let us compute 


f(z), Y) =(L4+ 2jY)(1+ Y)--- (1+ 27Y). 
We can write 7 = du, n = dv, where d = gcd(n,7) and then we clearly have 


FY) = (0HY) +e" Y). (1 +e Y))*4 = (1 — (-Y)”)*4. 





So the coefficient of Y” is (—1)**” (72). Since for any d|n there are exactly 
y(n/d) integers 1 < j < 2n such that gcd(j,n) = d, we deduce that the 
coefficient of Y” in +(f(z21, Y) +--+ f(zn,Y)) is exactly 


D Sent (3) (77) 


d|n 





which finishes the solution. o 


We continue with a very nice problem concerning the number of solutions 
mod p of a quadratic equation in several variables. Actually, the same methods 
also apply in some other situations, but the general problem of estimating the 
number of solutions mod p of systems of polynomial equations mod p is very 
deep. See the addendum of this chapter for more details. 


9. Let p be an odd prime. Find the number of 6-tuples (a,b,c,d,e, f) of 
integers between 0 and p — 1 such that 


a? +b eP Hef (mod p). 
MOSP 1997 


Proof. Let z be a primitive root of order p of unity. Since ar zT =Q ifr 
is not a multiple of p and equals p otherwise, the desired number of 6-tuples 
is 
1 Po 217212 92 2 £2 
ga- y SO zk(a? +6? +02 de —f?) 
P a,b,c,d,e,fEZ/pZ k=0 


7.2. Counting problems 291 


Note that 


— 
S=-S~ So phat tet—dl—el—$") 
P k=0 a,b,c,d, ,e,fEZ/pZ 


3 3 
2 kdg2 
Da E DE 
k=0 \aeZ/pZ dEZ/pZ 


In the previous sum, there is one obvious term: the one for k = 0, which 
gives us p. Also, for each 1 < k < p — 1 we have 


2 kg? 2_ 42 
`S ka , ` z kd — `S kla d*) _ `S ghey 


a€ Fp dEZ/pZ a,dEZ/pZ xr,yEZ/pZ 


by the change of variable a — d = x,a + d = y (which recovers a,d uniquely 
because 2 is invertible modulo p). On the other hand, for a fixed z # 0, we 


have 
S tvs =o, 


yEZ/pZ yEZ/pZ 


as the map y — kry (mod p) is a bijection of Z/pZ. Therefore, 
2 t yEZ/pZ z*t¥ = p, which shows that 


`S zhao’ | `S zd = p. 


a€Z/pZ deZ/pZ 


Combining the previous paragraphs yields the answer to the problem, namely 
p’ + (p — 1)p*. m 


Proof. First look at a? — d? = (a — d) (a + d) mod p. Since p is an odd prime, 
we can choose z = a — d and y = a + d arbitrarily and then solve uniquely for 
a and d via a = (x +y)/2, d = (y — x)/2. If x is nonzero, then since p is prime, 
ry takes on every value exactly once whereas if x = 0 then zy is always zero. 


292 Chapter 7. Complex Combinatorics 


Thus a? — d? = ry takes on the value zero 2p — 1 times and takes on every 
nonzero value p—1 times. Therefore it is described by the generating function 


2p —1+(p—1)X + (p—1)X74+--- + (p—1)X?! 
=p+(p—1)(1+X +X? +- +X!) 
in Z[X] modulo X? — 1. Note that modulo X? — 1, the polynomial 
LEX +X?’ 4---4 XP} 
has the feature that 
F(X) L 4X +X? 4---4 XP!) = FAX HX? 4---4 XP74) 


for any polynomial f(X) (as X — 1 divides f(X) — f(1)). Thus the generating 
function for the values taken on by (a? — d2) + (b? — e?) + (c? — f?) is 


(p+(p—1)(1+X+X?74---4 XP)" 
=p>+(p?—p*)\(1 + X + X74+---4 XP!) 


From this we read off the number of 6-tuples for which a? +b? +e? = d? +e? + f? 
(mod p) as p? + p’ — p°. o 


We continue with a very nice problem, in which one uses a mixture of 
the previous techniques, algebraic manipulations and a rather tricky number- 
theoretic argument. 


10. Let p be an odd prime. Prove that the 2°> numbers +1+2+---+ p 
represent each nonzero residue class mod p the same number of times. 
Compute this number. 


R. L. McFarland, AMM 6457 
Proof. Let z = er and write 


y e1+2e2+ +P Ep1 _ 
S= Z 2 ET = ao +aiz +++ + ap-12”7! 
e,€{-1,1} 


7.2. Counting problems 293 


for some integers a;. Since z” only depends on x (mod p), it is clear that 
a; is exactly the number of ways residue 2 is represented by the numbers 
+1+2+.. +P. Thus, the problem asks us to prove that a] = ag = ++: = ap—1 
and to find this common value. 

The point is that S has a nice closed expression, since it obviously factors 


p—-1ı1 
2 
S= [[(2 +27) 
j=1 


On the other hand, 


mA 


1 -1 
(2 +z) =S- IT ¢ (2 +271) = S. | (2? F427) = 8? 


1 


p 


3 


we. 
Il 

o. 
Il 


1 
j=? 


and 


p—l1 








| | 2 
(2 +z) = ea Tae Te +21) = 


The last relation uses the fact that z — 2z is a bijection of the nonzero 
remainders modulo p (as p is odd) and that 


1 


we. 
Il 


| 
—_ 


p . 
(14+ 27) =1, 


Qo, 
Il 
— 


which follows from 





p—1 

XP_] 
— oJ) — 

[x z) = > 

j=1 

by taking X = —1. 

The previous computation shows that S* = 1, so that S = +1 is definitely 
an integer. But then then relation ao — S + aiz +: + ap—12?7! = 0 implies 


that ag — S = aj = -:: = Gp-1. In particular, aj = --- = ap-ı and the first 
part of problem is solved. 


294 Chapter 7. Complex Combinatorics 


On the other hand, we clearly have 
—1 
ao +a, +::: + ap-1 =27, 
which combined with ag — S = a] = ::- = ap—1 and with S = +1 shows that 


24 


S= an (mod p) = (1) = (mod p), 


2_ 
so that S = (-1) >= and the value we are looking for is 


ali = Q2 5 t: = ap-1 = 


We used here a standard result in quadratic residues, saying that Legendre’s 
2_ 
symbol (2) = (-1) 7. O 


It is rather difficult to approach the following problem directly with the 
methods of this chapter. However, a small observation reduces the problem 
to a more familiar one. 


11. a) Let n be an odd integer. Find the number of sequences 
(a9,@1,...,@n) such that a; € {1,2,...,n} for all i, an = ap and 
a; — aj;-1 É i (mod n) for all ¿i = 1,2,...,n. 


b) Let n be an odd prime. Find the number of sequences 
(a0,a1,..-,an) such that a; € {1,2,...,n} for all 2, an = ap and 
a; — ai-1 # 7,27 (mod n) for all i =1,2,...,n. 


Reid Barton, USA TST 2004 
Proof. a) Call a sequence as in the problem admissible. Considering 
bi =a; —aj-1—2 (mod n) 
for 1 < į < n, the condition becomes b; # 0 (mod n). Note that 


bi +b2 +: +bnr = -—(14+2+4+---+n)=0 (mod n), 


7.2. Counting problems 295 


since n is odd. But conversely, if we have a sequence b1, bo,...,bn of elements 
of Z/nZ satisfying b; 4 0 (mod n) and bı +b2+---+bn =0 (mod n), then we 
can find precisely n admissible sequences giving rise to b1, bo,...,bn. Indeed, 
choose any ag € {1,2,...,n} and take for a, the remainder modulo n in 
{1,2,...,n} of the number a9 +14+2+---+k +6, +b2+---+ dy. Thus, 
if f(n) is the number of sequences (bj, b2,...,0n) E€ (Z/nZ — {0})” such that 
bı + bo +---+bn =0 (mod n), then the desired answer is nf(n). 


217 
Now, we evaluate f(n) in the standard way: let z =e , so that 


l = .. 
f(n) = ` N kl thet +bn) 


b1,- bn €Z/nZ—{0} k=0 


Since 
n—1 
`S zo =n-1 
b=1 
for k = 0 and it equals iz — 1 = —1 for 1 < k < n — 1, we deduce that 


fn) = MoE na 


n 


so the answer to part (a) is (n — 1)” — (n — 1). 

b) The same argument as in the first paragraph of (a) shows that it is 
enough to count the number g(n) of sequences (b1, b2,...,bn) with b; € Z/nZ, 
b; #0,7 (mod n) and bı + b2 +---+b, =0 (mod n). The desired answer will 
be ng(n). 


296 Chapter 7. Complex Combinatorics 


We compute in the same way 


g(n) — L -D9 SA k(b1 +b2+---+bn) 


Tbe ke 0 


‘Sys 


k=0 6;40,2 


y 2 


-JSI > ~” 


k=0 i=1 \b#40,i (mod n) 


For k = 0 the corresponding product equals trivially (n — 2)” - (n — 1) (the 
factor n — 1 comes from the sum associated to i = n). Now fixl<k<n-1. 
For all 1 < i < n we have 


XO Mag eh pee $2 VF (1 4 2) = -(1 + 2). 
b40,i (mod n) 


For 1 = n the corresponding sum is —1. Thus 


n n—1 l 
I] `S z | = — [a 4 2h), 
t=1 \b40,i (mod n) 1=1 


Now, since n is a prime and 1 < k < n, we have gcd(k, n) = 1, so the numbers 


z* with 1 < i < n — 1 are precisely the n-th roots of unity different from 1. 
Thus 

n—-1 

T]Ja@+-)= || G+ =1, 

i=1 u%=lul 


the last equality being a consequence of the equality 


—X"-1 
= X-)1 





7.2. Counting problems 297 


and of the fact that n is odd. We deduce that 
n 
Ü ys æ=- 
1=1 \b#0,i (mod n) 
for all 1 < k < n — 1 and so the answer to the problem is 


(n — 1) (n — 2)"-1_ (n — 1) 


g(n) = o 


The following problem is hard, even though the first steps are rather clear. 
The difficulty lies in the algebraic technicalities. 


12. Let p > 3 be a prime number. 
If X is a nonempty subset of {0,1,...,p— 1}, let f(X) be the number 


of sequences (a1,a2,...,@p—1) such that a; € X for all j and p divides 
SFL] jaj. Prove that f({0,1,3}) > f({0,1,2}), with equality if and 
only if p= 5. 


IMO 1999 Shortlist 


Proof. Let us first find a formula for f(X). Letting z = er. the usual 
argument yields 


f(X) = : > > zhlai+2a2+--+(p—1)ap—1) 


p—l1 
_i (X m) a. (X sons) | 
P k20 rex rex 


In particular, 


f({0,1,2}) = 5 (1+ 2 + 27) 


298 Chapter 7. Complex Combinatorics 


Since {z4|1 < j < p— 1} = {2)|1 < j < p—1} for all i £0 (mod p) and 
since p > 3, it follows that 


—] , 

1 = æI -1 gp-1_ 1 
f({0,1,2}) = = [ 327} 4 (p - 1) | | —— | = 1 + ———. 
({0,1,2}) = MSS - 

We also have 


p—1 /p-l 


({0,1,3)) == | a +6 +) 


i=0 \j=1 


So, if a, 8, y are the roots of r° +x + 1, we have (using again that for all i 4 0 
(mod p) the numbers (z); are a permutation of the numbers (z/),) 


p—1 
F({0,1,3}) = = 3P-1 4 (p - 1) TIC —a)(C - BC - 7) 
j=l 
_ 3P! + (p — 1)ap 
7 p 


where 
(a” — 1)(B" - D)” — 1) 

(a — 1)(8 - 1)(y - 1) 
It is thus enough to prove that ap > 1, with equality precisely for p = 5. 
However, this sequence satisfies a linear recursive relation with characteristic 
polynomial 


(X —a)(X -P(X —y)(X —a- BX —-B-y)(X —a-y)/(X -—a-B-7) 
= (X +1)(X° +X +1)(X -a B)(X - B-)(X -a-4). 


An = 


We can easily compute that 


x-a e-e) = (x44) (x42) (x42) 


= X$ - X? -1 


7.3. Miscellaneous problems 299 


and so the characteristic polynomial of the sequence (an)n is 
(x3 +£ + 1)(x? — x? — 1)(x£ +1) = zr’ — 22° — 27? — 2r — 1. 
Thus, there exists a constant C such that for all n we have 
An+7 = 2(an+3 + an4+2 + an+1) + an + C 


It is not difficult to compute that the first terms of the sequence (an)n are 
0,1,1,3,1,1,3,8,... and that the sequence is increasing after the sixth term 
(and C = —2). The result follows easily. o 


Remark 7.1. Except for the technical combinatorial part, which, as we have 
seen, is quite standard, the difficult point of the problem is establishing the 
inequality ap > 1, with equality precisely for p = 5. Proving that ap > 1 is 
however rather easy, without the computation of the characteristic polynomial. 
Indeed, note that ap is a symmetric polynomial with integer coefficients in the 
roots of X? + X + 1, so it is an integer. On the other hand, the polynomial 
function x? + z + 1 is increasing on the whole real line, so among a, 8,7 
precisely one is real, say a. Moreover, it is easy to check that —1 < a < 0. 
Thus l-o" > 0. Also, 


Q 





2 
> 0, 


(1 BP)(1— 9?) _ e 

(1-6)(1-y) |1-B 
so that ap > 0. It follows that ap > 1. However, we know no easy proof of the 
fact that ap > 1 for p > 5. 





7.3 Miscellaneous problems 


13. Let ax, by, ck be integers, k = 1,2,...,n and let f(z) be the number of 
ordered triples (A, B,C) of disjoint subsets (not necessarily nonempty) 
of the set S = {1,2,...,n} whose union is S and for which 


` Qi + `S bi + ` cà =x (mod 3). 


iES\A i€S\B i€S\C 


300 Chapter 7. Complex Combinatorics 


Suppose that f(0) = f(1) = f(2). Prove that there exists i € S such 
that 3 | a; + bi + ci. 


Gabriel Dospinescu 


Proof. Observe that 


3 


— ai+bi bi+c; ta, 
F(X) IT (x +X a 4 ata) 


XY Lies/A aitdies/B bit Diessc Ci 


{A,B,C} 
Thus, the equality f(0) = f(1) = f(2) = 3"~! becomes 
F(X) =3"1(1+X +X?) (mod X3- 1) 


Since 1+ X + X?|X% — 1 this means in particular that 1 + X + X?|F(X). 
So there exists i such that z+ 4 gbi+tci 4 ze1+% — 0, where z is a primitive 
third root of unity. This means that {a; + b;,b; + cj,c; + ai} is equivalent 
to a permutation of {0,1,2} (mod 3) and so 3ļa; + bi + ci. The conclusion 
follows. o 


Readers who are not familiar with basic linear algebra will have a rather 
hard time trying to solve the following problem. 


14. Let p be an odd prime and n > 2. For a permutation o of the set 
{1,2,...,n} define 


S(o) = o(1) + 20(2) +---+noa(n). 


Let A; be the set of even permutations o such that S(o) = j (mod p) and 
let B; be the set of odd permutations ø for which S(o) = j (mod p). 
Prove that n > p if and only if A; and B; have the same number of 
elements for all 7. 


Gabriel Dospinescu 


7.3. Miscellaneous problems 301 


Proof. Consider the matrix A = (X ij ), whose determinant is given by 


det A = `S x30) _ `S x$), 


o even o odd 
On the other hand, we also have a simple closed form for det A, given by 
Vandermonde’s formula: 


detA= || (X?- xX’). 
l<i<j<n 


The first formula and the definition of A;, Bj imply the following equality 
in Z[X] 


p—-1 
X (4; -B;)Xi= || (X-X) (mod X? -1). 
j=0 1<i<j<n 


Therefore, the problem comes down to showing that 


n>p += |] (X1 -X’)=0 (mod X’ -1). 


l<i<j<n 
This is actually very easy. Indeed, if n > p then clearly 


xP —yxPtt_x| [[ X- xX"). 
1<i<j<n 
For the other direction, take z any primitive root of order p of unity. Then 
by hypothesis [[)<;< j<n(2! — zt) = 0, showing that for some i < j we have 
zJ-* = 1. Since this implies that j — i is a multiple of p, thus at least p, we 
are done. oO 


The following problem is very challenging. Here, it is very hard to find 
the correct approach, especially because the problem suggests number theory. 


15. Is there a positive integer k such that p = 6k + 1 is a prime and (35) =] 
(mod p)? 
USA TST 2010 


302 Chapter 7. Complex Combinatorics 


Proof. The answer is negative, but the proof is far from being obvious. Sup- 
pose that p = 6k +1 satisfies (35) = ] (mod p). Let g be a primitive root mod 
p and let z = gô, so that z has order k mod p. Consider the sum 


s- Sot SEa] -EEA 


= i=0 \j=0 j=0 i=0 


Since z has order k mod p, the sum Si a l zij is 0 unless 7 is a multiple of k, 
which happens if and only if 7 = 0,k,2k,3k. Hence 


s= (CCEC) 
-(22(2))« een 


Since (3°) = 1 (mod p), we deduce that S = 4k (mod p). On the other hand, 


(1+ 2°)* = (14 2’) P7 = —1,0,1 (mod p). 


It is now immediate to check that we cannot have k remainders mod p, each 
of them —1,0 or 1, adding up to 4k modulo p. This contradiction shows that 
no such k can be found and solves the problem. o 


We present two difficult solutions for the following beautiful, but very 
challenging problem. 


16. Let p be an odd prime and let a,b,c,d be integers not divisible by p such 


O EEE 


for all integers r not divisible by p (here {-} is the fractional part). Prove 
that at least two of the numbers a+ 6,a+c,a+d,b+c,b+d,c+d are 
divisible by p. 


Kiran Kedlaya, USAMO 1999 


7.3. Miscellaneous problems 303 


Proof. Let r(x) € {0,1,...,p — 1} be the remainder of x mod p and observe 
that r(x) =p-{Z}. Thus the hypothesis can also be written 


r(an) +r(bn) + r(cn) + r(dn) = 2p 


for any gcd(n,p) = 1. Let z be a primitive root of unity of order p and let 
a’, b’,c',d’ be the inverses of a,b,c,d modulo p. For any m not a multiple of 
p we have 220" — 2™ sol we have Y r(an)2™™® = 2p2™". Fix a number 
m relatively prime to p and let n run over a system of representatives of the 
nonzero classes mod p. Then r(mn) runs over 1,2,...,p—1 and so does r(an), 
as a and m are not multiples of p. Hence if we take the sum over n of the 
previous relation, we obtain 


p—1 p—-1 
So [Xj | = 2p) 2? = -2p 


It is easy to check the identity 


nX™t1_ (n+1)X"41 


142X 4+---+nXx"™!= 
+2X4+---4+n (x — 1)? ; 


which easily implies that 


p—1 p 
. maj _— 
> jz ~ ya’m _ J 
j=l 
and similarly for b,c,d. Thus, we can write 


1 
dian =? 


and this for all m relatively prime to p. Clearing denominators and canceling 
equal terms yields (after a tedious computation) an equivalent equality 


2 4 `S (ato +c')m _ y yam y Qz(a' +b! +e! +d!) 


‘All unspecified sums are over a,b,c,d. 


304 Chapter 7. Complex Combinatorics 


which continues to hold (trivially) when m is a multiple of p. 
Finally, we add these relations for 0 < m < p—1. Note that 


for any N (simply because the corresponding sum equals p when z™ = 1 and 
0 otherwise). We deduce that 


p—-1 
y) `S zm(a'+b'+c' +d") > 2p, 


m=0 


which implies that a’ + b’ +c’ +d’ is a multiple of p (otherwise the left-hand 
side would be 0). Therefore, by the previous key equality, we also have 


_,l / 
) zm N gem 


for any m relatively prime to p. By multiplying the previous relation by goma 
we get 


142m ® a) 4 z(c —a')m 4 zm(d'—a’) 

— z72a'm 4.27 (a +b')m 4 z(a +c')m 4 z (etd) 
and a similar argument (add the equations and observe that the left-hand side 
is at least p) shows that at least one of a’ + b',a’ + c',a' +d’ is a multiple of 
p. Say pla’ + 0’, then also pic’ + d’ (as pla’ +b’ +c’ +d’) and so pla +b and 
plc + d, finishing the proof of this hard problem. o 


7.3. Miscellaneous problems 305 


Proof. This second proof uses Lagrange’s interpolation theorem, for which we 
refer the reader to chapter 11. As in the previous solution, 


r(x) € {0,1,...,p—1} 
is the remainder of z modulo p. Define 


Q(z) = 2r(z) — (2x) 
Pp 
so that Q(z) = 0 if 0 < r(x) < (p—1)/2 and Q(z) = 1 if (p—1)/2 < r(x) < p. 
Call (a,b,c,d) good if it satisfies the relation in the problem, which can also 
be written as r(ka) +r(kb) +r(ke) +r(kd) = 2p for all 1 < k < p. Note that if 
(a,b, c,d) is good, then so is (ka, kb, kc, kd) for any k which is not a multiple 
of p. Also, note that if (a,b,c,d) is good, then 


Q(a) + Q(b) + Q(c) + Q(d) = 2. 


Combining these two observations, we deduce that 


Q(ka) + Q(kb) + Q(ke) + Q(kd) = 2 


for all 1 < k < p. 

By Lagrange’s interpolation theorem, there exists a polynomial P(X) of 
degree at most p — 2 such that P(x) = Q(x) (mod p) for all x # 0 (mod p). 
Note that if R(X) = P(X + 1) — P(X), then R(x) = 0 (mod p) when 


_ p—3 ptt _ 
xr=1,..., yg pees, p—2 
and R((p — 1)/2) # 0 (mod p), so deg R = p — 3 and deg P = p— 2. On 
the other hand, the polynomial S(X) = P(Xa) + P(Xb) + P(Xc) + P(Xd) 
is congruent to 2 mod p for p — 1 values of the variable. Since deg S < p — 2, 
S — 2 must be the zero polynomial mod p. Imposing that the coefficient of 


X?P-? vanishes in S, we obtain the key relation 


a? +P- + P=? 4 dP? =0 (mod p), 


306 Chapter 7. Complex Combinatorics 


which can be also written (by Fermat’s little theorem) 


1 1 1 1 
-~4+-4+-4-=0€F,. 
a + b + c + d = tp 
Finally, since r(a) + r(b) + r(c) + r(d) = 2p, we havea +b+c+d=0in 
F,. Combining the two relations yields 


1 1 1 1 


b c atb4+e’ 


which readily becomes (after clearing denominators) (a + b)(b+c)(c+a) = 0. 
By symmetry, we may assume that a + b = 0 in F,. But then c + d = 0 also 
and we are done. go 


We end this chapter with a very deep result, whose proof is however 
elementary (and follows [52]). It improves previous results of Brown and 
Buhler [12], Frankl, Graham and Rödl [33] and finally Ruzsa [52]. For more 
details on some arguments concerning orthogonality relations and properties 
of characters, we refer the reader to addendum 7.A. 


17. Let p be a prime and let S C (Z/3Z)* be a subset containing no line 


of the affine space (Z/3Z)*. Prove that S has at most 230 elements. 


. 2d _ 
However, prove that we can find such a set with at least 33 ~! elements. 


Meshulam-Roth theorem 


Proof. Let f(d) be the largest cardinality of a set S C FÉ (we write F3 for 
Z/3Z) containing no line. Note that three distinct points of Fé adding up to 
the zero vector form a line and conversely, the sum of the points of a line is 
the zero vector. So, f(d) is also the largest cardinality of a set S containing 
no three elements that add up to 0. Since Ff and F = F.a (the field with 3¢ 
elements) are isomorphic as additive groups, we may work from now on with 
subsets S' of F. 

In particular, if S is such a set, the orthogonality relations for characters 
of abelian groups (theorem 7.A.5) yield 


3 
7D (Xr = 3 ` X x(s1 + 82 + 83) = |S], 
x 


ses $1,82,83ES X 


7.3. Miscellaneous problems 307 


the sum being taken over all characters x of F = F3a. The whole point is to be 
able to estimate the previous quantity for each character x. Let g(d) = io 
and take a subset S as in the theorem, but with |S] = f(d) = 3%9(d). Note 


that 
S "(lees — g(d — 1))x(x) = $ x(s) 
reF sES 


if x is not trivial and it equals $- es x(s) — 3%g(d — 1) otherwise (again by 
orthogonality of characters). We deduce that 


|S] = g(d- IS + 23 aD (Hw o) (£ les o(d- 1) xt) 


sES reF 
> g(d— 1)|S|? — 5 S_ x(s) 
x 


sES 


i S lzes — g(d— 1))x(2)). 


TEF 














Here comes the crucial estimate: 


Lemma 7.2. For all characters x we have 


< 34g(d— 1) - |S]. 








Š (lzes — g(d — 1))x(z) 


zreF 


Proof. If x is nontrivial, let Ho be its kernel, while if x is trivial, let Hp be any 
hyperplane of F (where F is seen as F3-vector space of dimension d). Take 
Hı, Hə two affine hyperplanes parallel to Ho so that the H;’s form a partition 
of F. Note that by definition x is constant on each H;, say x(x) = z; for 
x € H; (where of course |z;| = 1). Then 


S "(lees — g(d — 


reF 


2 


=|) zi(|SN Hi|- f(d- 1) 


1=0 











2 
<> |S Hil - (d—1)|. 
1=0 


308 Chapter 7. Complex Combinatorics 


But clearly S N H; does not contain three elements adding up to 0 and since 
H; is (d — 1)-dimensional, we deduce that |S N H;| < f(d— 1) (by definition 
of f(d—1)). Therefore, the previous estimate yields 








2 
X (ses — gld - 1))x(2)| < 3%g(d — 1) - X. |S N Ai 
reF 1=0 
= 3%g(d— 1) — |S], 
which is precisely what we wanted. o 


Coming back to the proof and using the lemma and Plancherel’s identity 
(theorem 7.A.5) 
2 


= 3|S|, 


2 


X 


X x(s) 


sES 








we deduce the estimate (do not forget that S was chosen such that |S| = 


3*g(d) = f(d)) 
|S] > g(d — 1)|S|° — |S|(3%9(d — 1) — |S), 


which implies that 
g(d—1) +374 
g(d-—1) +1 


Since g(1) = 2, it is immediate to check that the last estimate implies g(d) < 2, 
which is precisely what we wanted. 

Finally, let us show that f(3d) > 9%, which will trivially imply the re- 
maining part of the theorem. Let z = 3f + 1 and consider 


g(d) < 


S = {(a,a*)|a E€ Foa} C Foe x Faa. 


This has 9% elements and we claim that it does not contain three elements 
adding up to 0. Note that a” € F}a if a € Foa, because (a7)3*-1 = l] ifa #0. 
If (aj,aĵ) add up to 0, we obtain a; + az + a3 = 0 and aj + aj + a3 = 0. But 
then 

aj +a + (ai + a2)* = 0. 


7.4. Notes 309 


On the other hand, since we work in characteristic 3, we have 


d 
(a, + a2)” = (a) + a2)(a, + a2)? 
d d 
= (a1 + a2)(aj +43 ) 
= af + a3 + aja“! + azaj `. 
We deduce that 
(a1 — a2)" = (a — a2) (aĵ — ag") =0, 


a contradiction. g 


7.4 Notes 


We thank the following people for providing solutions to the problems 
discussed in this chapter: Mitchell Lee (problems 1, 2, 10, 15), Richard Stong 
(problems 6, 9), Qiaochu Yuan (problems 5, 6, 8), Victor Wang (problem 16), 
Alex Zhu (problems 15, 16), Gjergji Zaimi (problems 12, 13, 14). 


Addendum 7.A Finite Fourier Analysis 


The identity 


-1 
1 2ink(a—b) 
— `S e n = las (mod n) 


(for all integers a,b and all positive integers n) is at the origin of a lot of 
beautiful mathematical results, as we could see in chapter 7, but it is just 
a special case of a broader theory. In the first part of this addendum we 
will see a vast generalization of this relation, through Fourier analysis on 
finite abelian groups. Though rather elementary, these results are extremely 
useful in number theory and combinatorics. On the other hand, they are the 
very first component of a much broader picture, that of harmonic analysis. 
Things are much easier for finite abelian groups, since one avoids quite a lot of 
technicalities that appear when dealing with harmonic analysis on other groups 
(such as R, the unit circle in C or, much harder, non-abelian and non-compact 
topological groups). For instance, all convergence issues are automatic (as all 
sums we deal with are finite), while the integration theory has no subtlety 
(on the other hand, if one wants to do harmonic analysis on R, one has to 
be very careful with these issues and one has to develop a rather powerful 
integration theory first). Also, the fact that the group is abelian simplifies the 
problem considerably, basically because all irreducible complex representations 
of such a group are one-dimensional. We will also recall the main features of 
Fourier analysis on finite general groups, without proving anything, since this 
is already the content of a course in representation theory. Next, we deal with 
a classical, but amazing application of these ideas, namely Dirichlet’s theorem. 
This allows us to say a few words about L-functions of Dirichlet characters. 
This is again just the tip of a huge iceberg, which is far from being understood, 
even though progress is constantly being made. 


7.A.1 Dual group 


From now on, let G be a finite abelian group with n elements. We want 
to define the Fourier transform of a function f :G-—C. Recall that if f is an 
integrable function on R, with complex values, its Fourier transform is defined 


7.A. Finite Fourier Analysis 311 


by 


x) = J F(y)e? Ydy. 


2inTY 


The presence of the characters y > e of R will be our guide to Fourier 


analysis on abelian groups. 


Remark 7.A.1. When dealing with abelian groups (which will always be the 
case for us), one also denotes by + the internal operation of G. This is quite 
intuitive when dealing with groups such as Z/nZ, but definitely not suitable for 
groups such as (Z/nZ)*. Our convention will be the following: when dealing 
with an abstract abelian group G, we will use multiplicative notation for the 
internal operation of the group, while in concrete examples we will choose the 
most intuitive notation, depending on the situation. Hopefully, this will not 
create too much confusion... 


A character of G is a morphism of groups x : G —> C*. So, according to 
whether we use additive or multiplicative notation for the internal operation 
of G, a character satisfies x(x + y) = x(x) - x(y) for all x,y € G (respectively 
x(xzy) = x(x) - x(y)). The character is called trivial if x(g) = 1 for all g € G. 
Let G be the set of all characters of G. It becomes a group with respect to the 
obvious multiplication (x1-x2)(g) = xı(g)x2(g) and it is called the dual group 
of G. Note that for all y € G and g € G we have x(g)” = x(g”) = 1, because 
g” = 1 by Lagrange’s theorem. In particular, |x(g)| = 1 and x~!(g) = x(g) 
for all g € G and x € G. The idea of harmonic analysis is that all information 
about the space of C-valued functions on G is encoded in G. 


Example 7.A.2. Take n > 2 and G = Z/nZ. What is the dual group of G? We 
saw that x(1) is an nth root of unity. And clearly y is uniquely determined by 
x(1), as G is generated by 1. Conversely, if z is an n-th root of unity, x —> z” 
defines a character of G (by z” we mean 2° for any lifting a of x; this does 
not depend on the choice of a, as z” = 1). We deduce that G is isomorphic 
to Z/nZ, even though this isomorphism depends on the choice of a primitive 
nth root of 1, so it is not really canonical. 





It is an easy exercise to check that passing to duals is compatible with 
direct products (i.e. the dual of G x H is GxH ). Since any finite abelian group 
is a direct product of cyclic groups (this is a nontrivial, but absolutely classical 


312 Chapter 7. Complex Combinatorics 


result), it follows from this remark and the previous example that for any finite 
abelian group G, its dual is isomorphic to G. In particular, G and its dual have 
the same number of elements, which is really not obvious at first sight. We also 
deduce from this observation and example 7.A.2 that if £ € G—{1}, then there 
exists y € G such that x(x) 4 1 (if you think about this for a moment, you will 
realize that it is nontrivial!). Thus, the map g — (x — x(g)) is an injective 
homomorphism G —> G, realizing G as subgroup of G. Since IG| = |G], the 
previous injection has to be an isomorphism. So G is canonically isomorphic 
to its double dual. Let us now glorify what we have just proved: 


Theorem 7.A.3. If G is a finite abelian group, then G is an abelian group 


isomorphic to G and G is canonically isomorphic to G. 


Let F(G,C) be the C-vector space of all maps f : G > C. It is a C-vector 
space of dimension |G], since the map F(G,C) — CIC! sending f to (f(g))gec 
is obviously a C-linear isomorphism. If f,g € F(G,C), let 


fig = Gq DL s@se) 


rEG 


It is easy to check that this is an inner product on the C-vector space F(G, C). 
We are now ready to prove the main theorems of Fourier analysis on G. 


Theorem 7.A.4. The elements of G form an orthonormal basis of F(G,C). 


Proof. We split the proof in several steps. 

Step 1. We prove that (x1, x2) = ly=yo, ie. that (x) eG is an orthonor- 
mal set. If xı = x2, everything follows from the fact that x(g) has magnitude 
1 for any g and any character x. Assume that x = < is not trivial. Then 


(x1,x2) = Tg = Y xla) 


rEG 


Let S = $ eg x(x). Then for all g € G we have 


g9)S = > > x(gz) = >> x(x) = 


rEG rEG 


7.A. Finite Fourier Analysis 313 


because x — gz is a permutation of G. Since x is not trivial, there is g such 
that x(g) # 1 and the previous identity yields S = 0 and so (x1, x2) = 0. 

Step 2. We claim that for all z € G — {1} we have ye x(x) = 0. The 
same argument as in the first step shows that it is enough to prove that there 
exists x € G such that x(x) # 1. This is nontrivial, but has already been 
explained. 

Step 3. Since (x), <q is an orthonormal set, it is linearly independent. 
But since it has the same cardinality as the dimension of the vector space 
F(G,C) (namely |G|, by theorem 7.A.3), basic theory of vector spaces shows 
that it has to be a basis of F(G,C). The result follows. o 


In practice, one really uses the following consequence of the previous 
theorem: 


Theorem 7.A.5. For any finite abelian group G, the following relations hold: 


1) (orthogonality relations) For all X, x1, X2 € G and ghEeG 


1 —— 1 — . 
IGI `S x1 (2) x2(z) — lxi=x2; IG] `S x(g)x(h) = lg=h- 
rEG xEĜ 


2) (Fourier inversion) For all f € F(G,C) we have f = yeas XIX. 
3) (Plancherel’s identity) For all f € F(G,C) 


a So f(a)? = E 0. 


rEG xEG 


Proof. 1) has already been proved while proving the previous theorem. For 
2), note that for any x € G we have, by the orthogonality relations 


(Zu x) x) (x) = ai So x(x) X F(y)xy) 
X y 


X 


— a SO f(y) > x(2/y) = f(a), 
Y X 


314 Chapter 7. Complex Combinatorics 


from which the result follows. Finally, using 2) and the previous theorem, we 
can write 


gor =X $ (fx) x2) xa x2) = X x| 


rEG Xi X2 xEG 


Remark 7.A.6. More generally, we have 


(f:9) = X Xx) 


xEG 


for all f,g € F(G,C), as can be deduced by an obvious adaptation of the proof 
of Plancherel’s identity. 


By analogy with the usual formula in classical Fourier analysis 
1 27 


f(r) =5— | flye "May, 


we write F(x) = (f, x) and call it the y-Fourier coefficient of f. 


7.A.2 A glimpse of the non-abelian case 


One may ask whether the above theory can be adapted to the case when 
the group G is still finite, but no longer abelian. The answer is positive, but 
things are much more complicated than in the abelian case. In order to develop 
Fourier analysis on any finite group G, one has to consider all complex finite 
dimensional representations of G. By definition, such a representation consists 
of a finite dimensional C-vector space V on which the group G acts linearly, i.e. 
for each element g € G one has an automorphism still denoted g of V such that 
gh = goh (in the left-hand side gh is the automorphism associated to gh € G, 
while in the right-hand side we have a composition of automorphisms). In more 
down to earth terms, a representation of G is a morphism? p : G => GLn(C) for 
some n > 1. Two representations p1, p2 : Œ —> GL,(C) are called isomorphic 
if there exists P € GL,(C) such that po(g) = Ppi(g)P7! for all g € G. 


?Recall that GL,(C) is the group of matrices A € Af,(C) with nonzero determinant. 


7.A. Finite Fourier Analysis 315 


If n = 1, we call p a character (this is compatible with the definition of a 
character given in the previous section). Such a representation V is called 
irreducible if no proper subspace of V is stable under all automorphisms g, 
with g € G. For instance, a character is an irreducible representation, as there 
is no proper subspace at all! Moreover, the only irreducible representations of 
a finite abelian group are the characters, so the new theory will be compatible 
with the theory developed in the previous section. To prove the previous 
assertion, one has to use a basic result from linear algebra, stating that any 
commutative family of endomorphisms of a finite dimensional vector space 
over an algebraically closed field has a common eigenvector. So, if G is an 
abelian group acting irreducibly on V, there is a common eigenvector v for all 
g € G. But then Cv is a nonzero subspace of V stable under all g € G and so 
we must have Cv = V; hence V is a character. 
We have the following very basic theorem due to Maschke: 


Theorem 7.A.7. Any finite dimensional C-representation V of a finite group 
is a direct sum of irreducible representations, that is there exist sub-vector- 
spaces Vı,..., Vp of V stable under G, which are irreducible representations 
and such that V = 4;V;j. 


_ Now, the role of the dual G of G in the abelian case is played by the set 
G of (isomorphism classes of) irreducible representations of G. It turns out 
that this is a finite set and by the previous discussion its definition agrees with 
the usual definition if G is abelian. The set of maps F(G,C) is naturally a C- 
algebra and as such it is isomorphic to the group algebra C|G]. By definition, 
the elements of the last object can be uniquely written )) eg &g'g with ag E C 
(so the elements of G form a C-basis of C[G]) and multiplication is defined by 


Si agg) (X og] =X | dD, ands | g. 


gEG gEG gEG \hk=g 


Note that C[G] itself is a representation of G (each element of G acting as 
multiplication by g). The following theorem summarizes the properties of 


CIG]. 


316 Chapter 7. Complex Combinatorics 


Theorem 7.A.8. 1) G is a finite set and C[G] is isomorphic as a repre- 
sentation of G to By q(dimV)V, wherenV = V@V@---®V (n times). 
In particular, 
IG| = X (dim V)?. 


VEG 


2) The center of C[G] consists of all maps f : G — C such that f(g) = 
f(hgh—') for all g,h € G. Its dimension over C is equal to IG| and also 
to the number of conjugacy classe? of G. 


In the abelian case, we defined an inner product on C[G] and we showed 
that the characters of G formed an orthonormal basis of C[G]. All this can 
be done in the non-abelian case, though things are usually more difficult to 
prove. Namely, define for fı, fo € C[G] 


(fi, f2) = rep OE) 


gEG 


Now, to any representation p : G + GLn(C) one can associate its character, 
which is the element of C[G] defined by yp(g) = Tr(p(g)). The main theorem 
of Fourier theory on finite groups is then the following: 


Theorem 7.A.9. 1) If Vi, V2 are two representations of G such that xy, = 
xv, as elements of C[G], then Vi and V2 are isomorphic (and conversely). 


2) (xv)yeg is an orthonormal basis of the center of C[G] for the previously 
defined inner product. Moreover, a representation V is irreducible if and 


only if (xv,xv)=1. 


3) If g,h E€ G, then Viv eaxv(g)xv(h) is equal to the cardinality of the 
centralizer of g in G if g,h are conjugate in G and it is equal to 0 
otherwise. 


3The conjugacy class of an element g € G is {hgh™'|h € G}. 


7.A. Finite Fourier Analysis 317 


7.A.3 Some concrete examples 


Let us specialize the previously quite abstract (at first...) theory to a 
very concrete situation that appears frequently in number theory. Let N be 
an integer greater than 1 and let 


G = (Z/NZ)" 


be the abelian group of invertible residue classes mod N. A character of G is 
called a Dirichlet character of modulus N or simply a Dirichlet character mod 
N. If x is such a character, we set 


x(n) = lecd(n,N)=1 i x(n) 


for all integers n, obtaining in this way an N-periodic function. We will focus 
on a more restricted class of characters modulo N, namely the primitive ones. 
Let us recall what that means. If d divides N and if xa is a character mod 
d, then xq yields a character mod N simply by composing it with the natural 
map (Z/NZ)* > (Z/dZ)*. We say that a character mod N is primitive if it is 
not obtained in this way, for any proper divisor d of N and any yg. A more 
practical and useful criterion is the following: a character y mod N is primitive 
if and only if for any proper divisor d of N there exists n = 1 (mod d) such 
that gcd(n, N) = 1 and y(n) # 1. We leave to the reader the easy task of 
checking this. 

Let us see what happens when we apply the abstract theory to this situa- 
tion. Let a be an integer prime to N and let f be the map f(n) = ln=a (mod N). 
It is naturally a map on G and it is clear from the definitions that for all char- 
acters x of G we have f(x) = y(a). So, using Fourier’s inversion formula, we 
obtain 





ln=a (mod N) — -N X xla)x(n), 
X 


the sum being taken over all Dirichlet characters mod N. This relation holds 
for all n prime to N and plays a crucial role in the proof of the famous 
Dirichlet’s theorem, to be discussed in the next section. 


318 Chapter 7. Complex Combinatorics 


Gauss sums and Fourier coefficients of Dirichlet characters 


To any Dirichlet character y mod N we attached an N-periodic function 
on Z, defined by x(n) = lgca(n,N)=1 ` X(n). Any N-periodic function on Z 
induces a map on Z/NZ and we can look at its Fourier coefficients with respect 
to this finite abelian group. As we have already seen, the characters of Z/NZ 


TAX 


are identified with Z/NZ (we identify a and the character x > eo N ). Via 
this identification, the Fourier coefficients of a map f on Z/NZ will be denoted 





PN 1 —inrs 
fir) == Faye 


reEG 


The following result gives further information about these coefficients when f 
comes from a Dirichlet character: 


Proposition 7.A.10. Let x be a Dirichlet character mod N. 


1) For any a relatively prime to N we have 





x(a) = x(a)x(1). 
2) If x is primitive, then x(a) = 0 whenever gcd(a, N) > 1 and we have 


cla) = 2 


for gcd(a, N) = 1. 
Proof. 1) Since x(x) = 0 whenever gcd(x, N) > 1, we have 


Ra) = 57 x(a 


rEG 


217 


where C =e N . Asx —> ar is a permutation of G, we have 


x(a) (a) = È E xla) = È E xe) = RO) 


ztEG rEG 


7.A. Finite Fourier Analysis 319 


and the result follows from |x(a)| = 1. 
2) Write N = dv and a = du with d > 1 and gced(u, v) = 1. Let 


2inu 


C= e v, 


a primitive root of unity of order v. Note that 


x(x)” 


xz (mod N) 


= 


=F > x + kv) | G. 


j (mod v) \k(mod d) 


It is thus enough to prove that 


55 = D x(j + kv) 


k (mod d) 


vanishes for all j. Now, since x is primitive, there exists n = 1 (mod v) 
such that gcd(n,N) = 1 and x(n) # 1. It is easy to see that (n(j + kv) 
(mod N))k (mod a) is simply a permutation of (j+kv (mod N))k (moa a): Thus 


= X x(n (j + kv)) = X x(j+kv)= 


k (mod d) k (mod d) 


and since x(n) # 1, we have S; = 0. This proves the first part of 2). 
Finally, using Plancherel’s identity (theorem 7.A.5) for Z/NZ and the 


320 Chapter 7. Complex Combinatorics 


information we have already gathered, we deduce that 


AV E eP 


xz (mod N) 


S Ra)? 


a (mod N) 


Il 
rT — 
TX 
& 
eX) 
E 
To 


from where we obtain |Ẹ(1)| = ——. Using again part 1), the result follows. O 
VN 


7.A.4 Some applications 


We have developed enough theory in the previous sections to be able to 
prove quite a few nontrivial results. The first one is very elementary, but tricky 
and taken from [71]. The interested reader will find in loc.cit many beautiful 
applications of finite Fourier analysis. 


Proposition 7.A.11. Let A be a finite set of integers and let f : A > Z/pZ 
k 

be a map. Then for any positive integer k there exist at least lar (2k)-tuples 

(a1,...,@2~) E A* such that 


f(a) + f(a2) +--+ flak) = f(@e4i1) + f(ak+2) +++ + flax) (mod p). 
Proof. Let N(j) be the number of 2k-tuples (a1,...,a@ax) € A** such that 
f(a1) + f(a2) +--+ + flak) = flaksi) + f(ak+2) +--+ f(azxx) +j (mod p). 


Clearly 7?75 N(j) = |A|?*. We will prove that N(0) > N(j) for all j, which 
will be enough to deduce the desired result. Next, note that by the orthogo- 


7.A. Finite Fourier Analysis 321 


nality relations we can write 


N(j) = y S T b( f (a1) \t---f (Qp41)-- -f (a2K)—-J) 


Q1,.. ane A P b=0 


1S. _ Bindi (pee) (Soe aa 


b=0 acA acA 
1 2k 
1 p _2inbj 2inbf(a) 
= — e P ) e P 
p b=0 acA 








But it is apparent in this last expression that N(j) < N (0), finishing the 
proof. o 


The method used to prove the following result is extremely useful in ad- 
ditive combinatorics. 


Theorem 7.A.12. (D.Hart, A.Iosevich) 
Let q be a prime power, d a positive integer and let A C Fé be a set with 


d+1 
more than q7 elements. Then for any x € F} one can finda, b € A such that 
a:b= zx (here - is the standard inner product in Fé ). 


Proof. The idea is to express the number of solutions of the equation a-b = x 
using the orthogonality relations, then analyze the error term. Let x be a 
nontrivial additive character of F, and let n(x) be the number of solutions of 
the equation a: b= x with a,b € A. The orthogonality relations yield 


=) - DEC (a-b-zxz)) = IAI | p, 
a,bE A d cF, q 


where 


== YOY xelat- a) 


d IEA bEA ceF; 


322 Chapter 7. Complex Combinatorics 


Now, using the Cauchy-Schwarz inequality and once again the orthogonality 
relations, we obtain 


2 


< ADEE elab- 2) 


acA |bEA cX0 


3 -DD ` `S x(a + (c1bı — c2b2))x(x£(c2 — €1)) 


acd b1,b2€A c1,c2EF} 


a SO X xele- a)) X xla: (cibi — c2b2)) 


b1 ,b2€A c1,c2EF% acFd 


= gq? |Al ` ` x(x(c2 _ C1))1e,b, =cabe- 


b) ,b2€A c1,c2EF} 


Using the orthogonality relations and the substitution sı = a, S2 = C2, 
we can write 


`S `S x(z(c2 — C1))Leibi =c2b2 
b1,b2E€A C) c2 EF% 
`S `S ls1b1=b2 `S x(zs2(1 — sı)) 


b1, b2€A 51 EFS s2€F% 
= $ (4-1la=»- J, dD) lsn=o 
bi, b2€A bi, b2E€A 5,41 
< (q -— 1) `S lbi =b2 
b, ,b2EA 
< qļAl. 


Combining this with the first formula for n(x), we deduce that n(x) > 0 
if |A| > F, from where the result follows. O 


Corollary 7.A.13. Let A C F, be a subset with more than q2t 2a elements. 
Then for any x € Fj there exist ai,...,aq E A and bj,...,bg E A such that 
r = a1b1 + a2b2 +--+ + adba. 


7.A. Finite Fourier Analysis 323 


Proof. Apply the previous theorem to the subset A x A x--- x A of Fê. o 


The following bound on character sums is a basic tool in analytic number 
theory. 


Theorem 7.A.14. (Polya- Vinogradov) Let x be a primitive character modulo 
N. Then for all positive integers m,n we have 


XO x)| < VN log N. 


mM<j<Nn 


Before giving the proof, let us mention that Schur proved that if y is a 
primitive character mod N, then 


ax | $, x(n) > —VN, 


n<M 
so the Polya-Vinogradov inequality is not far from being optimal. 


Proof. Using Fourier’s inversion formula and proposition 7.A.10, we can write 


- La Jen” . 


=1 


Adding this over j, using the triangle inequality and proposition 7.A.10, we 
deduce that 
2i7man 2imam 


. — 1 1 
S x= 2 x(a ) = e YN > [sin =] 


m<j<n (a,N)= en -1 a,N)=1 


IA 


An easy convexity argument shows that sin x > 2r for 0< x < 5. Applying 
this to r = 47 (respectively x = aUa) ) if a < x (respectively a > >), we 


deduce that i 
dS xI VN: >) — < VN log N 


MmM<J<N a<N/2 


and the result follows. O 


324 Chapter 7. Complex Combinatorics 


Here is a nice application of the previous theorem, taken from [59]. We 
refer the reader to that paper for many refinements and further discussion: 


Theorem 7.A.15. (Murty) Let p be a a prime and let q be a prime divisor of 
—1. Leta be an integer such thata T =1 (mod p). Then there exists an 
integer x such that |z| < pil loep and x1 =a (mod p). 
In [59], Murty proves that the hypothesis that q is prime is not neces- 
sary and that we can strengthen the conclusion by asking that |r| < cp?/*/q 


for a suitable absolute constant c. This requires some stronger estimates on 
character sums. 


Proof. Consider a parameter p > T > 1 and look at the number of solutions 
of the congruence zr? = a (mod p) with x € [1,7]. Using the orthogonality 
relations, we write this as 


S = `S lnaza (mod p) 


n<T 
-E= LY xm 
n<T x (mod p) 
=) ) X x4(n 


n<T 


In the previous sum we will distinguish those characters x such that x1 = 1 
from the others. Indeed, if x? = 1, then $} „<r x1(n) =T and x(a) = 1 (since 
the hypothesis on a implies that a is a qth power in F$), while if x? # 1, 
Polya-Vinogradov's theorem gives 


X x4(n)| < Vlog p. 


n<T 


Thus, since there are precisely q characters xy such that x1 = 1, the previous 
remarks yield 

qT p—-l-q 

L < =; VPlogp < vplogp. 


s-z 


7.A. Finite Fourier Analysis 325 


Now, everything follows from the simple observation that if S Æ 0, then there 
exists 1 < n < T such that n? =a (mod p), while if S = 0, then 


log p 
q 


T< p? log p _ 1. o 


If p is a prime, let no(p) be the smallest positive integer a such that 
a is a quadratic non-residue mod p. It is an easy exercise to check that 
no(p) < 1+ yp, but it is much more challenging to find nontrivial bounds for 
no(p). The next result gives such an upper bound, by combining the Polya- 
Vinogradov inequality with Mertens’ theorems. 


Theorem 7.A.16. (Vinogradov) If p is a sufficiently large prime number, 
1 
then there exists 1 < a < p2v¢ log’ p such that 


9- 


Proof. Let m = [p log? p| and suppose that 


9- 


i 
for all 1 < a < X = pie log? p|: Let N be the number of quadratic non- 


residues among {1,2,..., m}. By Polya-Vinogradov’s inequality 
a l 
m-2N]| = |$ (Ž)| < vBiogp, 
r=1 








so N> F- 5p log p. On the other hand, any quadratic non-residue mod p 
must have a prime factor greater than X. Thus 


m 1 m 


X<qcm q 


326 Chapter 7. Complex Combinatorics 
Mertens’ theorem (theorem 3.A.5 and the remark following it) yields 


m logm m 
“=m! 
Diir mlog pey +0 (rez) 


X<q<m 








Note that pry = O(,/p: log p) and 





log O8™ _ og + log p + 2 log log p 1 
log X NE log p + 2log log p X log p 
log log p 
= + log — ier _ (as) 
1 + 4/ek Per X log p 


4 — 1)log] 1 
_ A(ve =D loglogp | 7, | 
log p log p 
Combining these estimates yields 
m 1 m log log p 
— — -,/p] —-4 — 1)m-——— l 
z ~ 9 VPlogp < z — 4(ve — 1)m logp | (VP IEP), 


which is not possible for p large enough. o 


7.A.5  Dirichlet’s theorem 


We will use the previous tools and a bit of complex analysis to give a 
proof of the famous 


Theorem 7.A.17. Leta and N be relatively prime positive integers. For 
s => 1+ we have 


`S 5 = aN) . log (=) + O(1). 


p=a (mod N) 


In particular, there are infinitely many primes p =a (mod N). 


7.A. Finite Fourier Analysis 327 


The proof of theorem 7.4.17 combines Fourier analysis on G = (Z/NZ)* 
and very subtle estimates of the L function of a Dirichlet character near 1. To 


start with, define the complex L-function of a Dirichlet character xy mod N 
by (recall that x(n) = 0 for gced(n, N) > 1) 


L(s,x) = xt) 
n>1 





As |x(n)| < 1 for all n, this series converges uniformly on compact subsets of 
Re(s) > 1, so L(s, x) defines a holomorphic function in this region. Moreover, 
a simple argument going back to Euler and using the unique factorization 
theorem and the fact that zł- = „>o T” for |z| < 1 shows that for Re(s) > 1 


we have 1 
51x) = LI 1 — x(p)p-$ 


We easily deduce from this that L(s, x) does not vanish if Re(s) > 1. Moreover, 
if we choose a branch of the complex logarithm, we can write 


log L(s, x) = 7 Leet = xlp y= yee 


p n>1 


Dirichlet’s key insight was to use the Fourier transform to express the con- 
dition that n = a (mod N) in an analytic way. More precisely, multiplying 
the previous relation by x(a), summing over y and using the orthogonality 
relations, we obtain for s > 1 








1 
N S- x(a) log L(s, p> ~~ = 
x (mod N) n>1 
p"=a(mod wy 
1 
= `S = + O(1), 
p=a (mod ny P 


the last estimate being a consequence of the =“ 


<2 d om SLL = Lop) 


n>2 n>2 P 
=a (mod m 





328 Chapter 7. Complex Combinatorics 


The crucial point is that log L(s, x) remains bounded as s —> 17 precisely 
when x is nontrivial and that log L(s,1) (where 1 is the trivial character) is 
relatively easy to handle and yields the factor log L. If the second part is 
rather easy to prove, the first part is a very deep result, equivalent to the 
non-vanishing of L(s,x) at s = 1 whenever x is nontrivial. Dirichlet’s proof 
was very roundabout, but we have quite a few different ways to prove this 
nowadays. First, let us deal with the “easy” part, which will also be important 
in the proof of the hard part. 


Theorem 7.A.18. L(s,x) extends to a function on Re(s) > 0, which is 
holomorphic except possibly at s = 1. If x is nontrivial, this function is holo- 
morphic at s = 1, but if x is trivial, we have 


lim (s — 1)L(s,1) = |] (1 — *) 


—1t+ 
° p|N p 


Proof. The key ingredient is Abel summation. Suppose that x is nontrivial 
and note that for all n we have |x(1) + y(2) +---+x(n)| < N, which follows 
easily from the orthogonality relations (theorem 7.A.5). An easy computation 
shows that 

n-§ —(n+1)7* = sn-*!4O0(n-*-?), 


which is uniform for s in compact sets. Thus the series 
NOX) + x(2) +++ + x(n)) (n~ = (n +1)*) 
n>1 


converges uniformly on compact subsets of Re(s) > 0. Moreover, by Abel 
summation, the sum of the series is L(s,y) if Re(s) > 1, which yields the 
holomorphic extension of L(s, x) to Re(s) > 0. 

On the other hand, if y is trivial, the inclusion-exclusion principle yields 


L(s,x)= [I (1-5) 60s) 6) == 


ns 
p|N n> 1 


n+1 1 1)!-s — n!-s 
/ (n-* — 17) dt L4 MEN) on 
n 


Since 


ns s— 1 ? 


7.A. Finite Fourier Analysis 329 


we get 





—t~*) dt. 


n>1 


But for t € [n,n + 1] we have 


‘ety Fl ape ls 
n TST T| > n rRe(s)+1 Ta nl+Re(s)’ 


yielding the uniform convergence of the series 


TO t~°) dt 


n>1 


t-n] = 








on all compact subsets of Re(s) > 0. Thus g is holomorphic on Re(s) > 0 and 
since Ç(s) = -+ + g(s), the result follows. o 


Taking into account the previous theorem and discussion, theorem 7.A.17 
is a consequence of the difficult 


Theorem 7.A.19. If x is nontrivial, then L(1, x) #0, so log L(s, x) remains 
bounded for s +17. 


Proof. We saw that for all s > 1 we have 


1 
IN) X x(a) ) log L(s, x) = ` >” > 


x (mod N) p n>1 
p” =a (mod M 








so by taking a = 1 we obtain 


I] 4.x) 21 


x (mod N) 


But by the previous theorem all L(s, x) with x nontrivial are holomorphic at 
s = 1 and L(s,1) has a simple pole at s = 1. Combining this observation 
with the previous inequality, we deduce that there is at most one nontrivial 


330 Chapter 7. Complex Combinatorics 


character x such that L(1,x) = 0 (otherwise the product of L(s, x) would 
vanish for s + 1+). Assume that x is such a character, then clearly 


so that we must have y = X, that is x takes only values +1 and 0. Thus, it 
remains to prove that L(1,x) # 0 whenever x is a nontrivial real character. 
We will present Paul Monsky’s elementary proof from [54]. 

Consider the function 


f(x) = x(n 


n>1 


defined for x € [0, 1) (the series is obviously absolutely convergent for such z). 
We claim that f is unbounded as x + 1~. Indeed, we can write 


f(z) =S > x(n) =Y Y x(a) | 2”, 


n>1 J21 n>1 d|n 


all manipulations of the series being justified by the absolute convergence. Let 
Cn = J am X(4). If p divides N, then cp: = 1 for all k > 1, as x(p) = 0. Also, it 
is easy to see that Cmn = CmCn for gcd(m,n) = 1. An immediate computation 
shows that cı > 0 for all primes / and all k and using the previous observation 
we deduce that cn > 0 for all n. As infinitely many c,’s are equal to 1 and 
all cn are nonnegative, >~,, cnx” is not bounded as x — 17 and the claim is 
proved. 

Now, assuming that L(1,x) = 0, we will prove that f is bounded as 
xz — 17, which will finish the proof. If 


n 


1 T 
balz) = n(l—x) l-r” 


then the hypothesis L(1, x) = 0 reduces the boundedness of f(x) as r — 1 to 
the boundedness of 5°, bn(x)x(n) as x + 1. The miracle is that the sequence 


7.A. Finite Fourier Analysis 331 


(bn(£))n is nondecreasing, as one easily checks using the AM-GM inequality 
that 


bn(zr) — bn+1 (1) 


o 1l 1 r” 
= l1-=r\n(n+1) (l4r4t---+2™1)(l4+r4---4+ 2") 


is nonnegative. Next, using Abels summation formula, the monotonicity of 
the bn(x) and the fact that | 5~;_., x(i)| are bounded, it is easy to see that f 
is bounded. But this contradicts the result of the previous paragraph, ending 
the proof of Dirichlet’s theorem. o 


Once we know that L(1,x) Æ 0 for all nontrivial characters x, it is a 
natural question to ask how far it is from being zero. If we look carefully at 
the proof of the previous theorem, we see that it gives L(1,x) > 0 for any 
nontrivial real Dirichlet character. The following deep theorem gives much 
more. The proof is much more involved: 


Theorem 7.A.20. (Siegel) For any € > O there exists c(£) > 0 such that 
L(1,x) > cle) for any real primitive Dirichlet character modulo N. 





7.A.6 A useful consequence 


The purpose of this section is to give an analogue of Mertens’ theorems 
for primes in arithmetic progressions. This uses the proof of Dirichlet’s the- 
orem and the standard technique of Abel summation. Recall that A is Von 
Mangoldt’s function, defined by A(n) = log p if there is a prime p and k > 1 
such that n = pë, and A(n) = 0 otherwise. Its usefulness in number theory 
comes from the relation } `am A(d) = logn, which will be heavily used in the 
proof of the following result. 


Theorem 7.A.21. Leta, N be relatively prime positive integers. Then 





l l 


= P (N) 


p=a (mod N) 


332 Chapter 7. Complex Combinatorics 


Proof. Since 


ye P< x 





k>2 p 
we have A(n) 
Og p n 
== = + O(1). 
y= y AM Lom 
pír NX 
p=a(mod N) n=a (mod N) 


Using the orthogonality relations, we can write 


A(n) l Ty x(n) 
— = ——— xla —~“A(n), 
ee ae 2 n A 
n=a (mod N) 7 
so everything comes down to the study of 
n 
n<r 
If x is trivial, Mertens’ theorem 3.A.4 yields 
l l 
F(e)= D QP = + O(1) = loge + O(1). 


p*<x p 
gcd(p,N)=1 





psz 


Suppose now that x is nontrivial. We will prove that F(x,x) is bounded as 
x — œ, which will finish the proof of the theorem. Exploiting the relation 
2am A(d) = log n, let us consider the expression 


=) —— x(n ) logn. 


n<T 


Since x > logg is decreasing for xz > 3 and since the partial sums of the 
sequence (x(n))n are bounded, Abel summation shows that E(x) = O(1). On 
the other hand 


E(x) = y aad Aai j= J MAD) (dı) 5 te 


d\do<z dı <z dı d2<xz/dı 





7.A. Finite Fourier Analysis 333 


Another Abel summation (using the fact that the partial sums }>;_, x(k) are 
bounded) shows that 


x) -5 x(9) _ 1 
We deduce that 


E(x) = F(x,z)L(1,x) + 0 = S~A(d) 


d<x 
Since by theorem 7.A.19 we have L(1, x) # 0, in order to prove that 
F(x, x) = O(1) 


it is enough to prove that 


- X A(d) = O(1). 


d<z 
But 
log x 
— < 2 . — 
X A(d) `S log p < Slog p log p logx- r(x) = O(z), 
d<x pk <r psx 
using theorem 3.A.2. The result follows. oO 


7.A.7 An “elementary proof” of Dirichlet’s theorem 


The proof of Dirichlet’s theorem that we presented used rather heavily 
basic properties of holomorphic functions, but it turns out that the ideas used 
in the proof of theorem 7.4.21 may be combined with careful estimates to 
obtain an entirely “elementary” proof of Dirichlet’s theorem, i.e. completely 
avoiding the use of complex analysis. We will sketch such a proof in this 
section, but we must emphasize that the complex analytic approach is much 
more powerful and conceptual. 


334 Chapter 7. Complex Combinatorics 


We will actually give an elementary proof of theorem 7.A.21, which ob- 
viously implies Dirichlet’s theorem. We will use the notations of arguments 
of the proof of theorem 7.A.21. The only “non-elementary” result used in the 
proof of this theorem is the fact that L(1, x) Æ 0 for all nontrivial characters 
x. Monsky’s proof (see the proof of theorem 7.4.19) deals with the case when 
x is real in an elementary (but very tricky) way, so it remains to deal with the 
case when y is not real. Let k be the number of nontrivial characters x for 
which L(1, x) = 0 and assume that k > 1. Then Monsky’s result implies that 
k > 2 (since if L(1,x) = 0 then L(1,X) = 0). Either a direct computation or 
a simple application of Mobius’ inversion formula yields the identity 


X u(d) log = = ln=1 - log z + A(n), 
d|n 


from which a standard computation yields 


r= XM 











nír 
d,d 
— log x + `S u(d ee) og = 
did <z 
x(dı) x(d2) 
(d1) log — . . 
=-logr+ X` ul 1) BT d, ib 
d, <x dS 7; 


The argument used in the proof of theorem 7.A.21 shows that F(x,x) is 
bounded if L(1,x) # 0. Assume that L(1,x) = 0. Then Abel’s summation 
xa) — 


formula shows that 7 4>z O(1/z), so the previous formula yields 


F(x,x) = —logr +O = bes + 


T ike 


[x] log x — ser | 


loge +0 ( 


= — log x + O(1). 


7.A. Finite Fourier Analysis 330 


The conclusion is that 


SF (x, 2) = (1—k)logr + O(1). 


X 


But this contradicts the fact that k > 2 and (again by the orthogonality 
relations) 


Erwe: D A o 
X 


N<zx 
n=1(mod N) 
The reader has probably noticed that the crucial part of this argument was 


an “elementary” version of the argument used in the beginning of the proof 
of theorem 7.A.19. 


Chapter 8 


Formal Series Revisited 


In this chapter, we discuss applications of generating functions in com- 
binatorics or combinatorial number theory. This is a rather vast subject and 
we will only be able to scratch its surface through some nice examples. The 
idea is very simple (even though in practice things are less clear): given a 
sequence (an)n of complex numbers (but we may allow this sequence to take 
values in any ring, in theory), one is sometimes able to extract information 
about the sequence from its generating function $`, >o an X” in an easier way 
than by dealing with the sequence itself. For instance, if the sequence satisfies 
a rather complicated recursive relation, it is very hard to study the sequence 
directly, but quite often it turns out that its generating function satisfies some 
differential equations or some regularity properties that allow us to study the 
sequence. Also, in counting problems it is very often much easier to find the 
generating function of the desired number of objects to count than to find di- 
rectly that number. Of course, one sometimes needs quite a lot of imagination 
to extract the information from the generating function, but it should be a 
principle that if one knows the generating function, then one knows a lot of 
things about the sequence. It is important to note that when dealing with gen- 
erating functions one neglects all convergence and analytic issues. However, 
once one knows the generating function, one usually uses analytic arguments 
to study the sequence. 


338 Chapter 8. Formal Series Revisited 


Before passing to concrete problems, let us recall a few things about op- 
erations with formal series. Addition and multiplication are defined just as for 
polynomials. It is an easy exercise to check that a formal series has an inverse 
(for multiplication) if and only if the constant term is nonzero. A more subtle 
problem is the composition of formal series. Here, one has to be a bit careful, 
as simple things such as )_, (xr do not make sense formally. To do things 
properly, the best way is to introduce the X-adic valuation on formal series: 
if f =ag+a,X +--- is a formal series with complex coefficients (more gener- 
ally, any field), let ux(f) be the least nonnegative integer n such that a, Æ 0. 
Thus f- X7x) is an invertible power series and one easily checks using this 
that vx (fg) = vx(f) + vx(g) and vx(f +g) > min(vx(f),vx(g)). Thus vx 
behaves as the p-adic valuation. 


Definition 8.1. Say a sequence of formal series f, converges towards a formal 
series f if ux(f — fn) tends to oo. If f =ag+a,X +---, this means that for 
all i, the coefficient of X* in fn is a; for all sufficiently large n. One says that 
f = din>o fn, respectively f = [In>0 fn if the sequence (fo + fi +--+ + fn)n 
(respectively (fofi -+ fn)n) converges to f. 


It is an easy exercise for the reader to check that $`,„>o fn converges to a 
formal series if and only if vx(f,) tends to infinity (the analogous result holds 
for p-adic numbers). Also, if fn are formal series such that f,(0) = 0, then 
[L,>o(1+ fn) converges if and only if vx(f,) tends to oo. We can now explain 
the composition of two formal series. Assume that f,g are formal series such 
that g(0) = 0 and f = ag +a, X + a2X7+---. The composition f og is then 
by definition the formal series 


f ° g = ao + ag + agg? +--- 


The series converges, because vy(g”) > n. All other operations on formal 
series (such as differentiation, integration) do not involve any difficulty and 
have the properties that we imagine. 

Before passing to problems, let us recall the following useful extension of 
the binomial formula: for any rational number a@ we have 


+X =Y (*)x", 


n>0 


8.1. Counting problems 339 


where (<) = alat) (ornt) A very useful special case is a = —m, where m 


is a positive integer. In this case the binomial formula becomes 
1 m+n—1 
—______ = X”. 
1- xX)” D m-] ) 
n>0 


Finally, we will sometimes use the classical notation [X”]f for the coeff- 
cient of X” in f. 


8.1 Counting problems 


The first exercise is quite technical, but there is no serious difficulty in it 
and the result is quite beautiful. Recall that if £» is a sequence of complex 
numbers, an equivalent for it is a sequence yn in closed form such that va 
converges to 1. 


1. Let a1,a2,...,an be relatively prime positive integers. Find an equiv- 
alent as k — oo for the number of positive integral solutions of the 
equation a11 + a2£2 + ::: + aAnTn = K. 


Proof. The generating function of the sequence y,, the number of positive 
integral solutions of the equation azı + a2ų2 +... + aAnTn = k, is 


F(X) => wX" 


k>1 
— X | XTi + +HanTn 


T1,£2,..., n21 


` yaz |... `S XanTn 


zı21 Inl 


n yai 
= |] ye 


i=1 


This is a rational function and in order to analyze sequence y, we will 
decompose it into simpler pieces. Namely, if 1 = 21, z2,...,z, are the distinct 


340 Chapter 8. Formal Series Revisited 


roots of the polynomial [[;_,(1 — X%), with multiplicities m1, mo,...,mr, 
then by general theory there exist constants C;; such that 


= Ont DY gee Xz) 


14a | 


Using the binomial formula, we obtain 


ou = (=z (1 - xy" = (271 ar K HET xe 


k>0 


Inserting this in the formula for f and re-arranging terms yields 


r mi ae j+k-1 
=X Y z 2cy( k ) 


i=1 j=! 


-FF ja D Et) (k+j-1). 


i=1 j=1 


We claim that the dominant term in the previous sum corresponds to 
i = l and j} =m, = n. First, note that m; < n for all 1 4 1. Indeed, each of 
X°%j — 1 has simple roots, so the only possibility for m; > n would be that z; is 
a root of all X°* — 1, thus also of X — 1 and so i = 1. Thus the contribution to 
Yk of all terms with i Æ 1 is dominated by ck”~? for some constant c. A similar 
argument shows that the contribution of the terms with i = 1 and 7 < n is 
also dominated by a constant times k”~?. Thus the main contribution is that 
of the term with i = 1 and j = n, which is = ah (K+1)---(kK+n—1)Cjy. To 
find Cin, note that 





| TT ize (—1)" 
n=l —1)"=(-1)"1 — =—— 
Cin = lim f(x)(x — 1)" = (-1)" lim LL [aaa T aaz -an 


for Yk. 
O 


Combining the previous observations yields the equivalent aot Ila; a 


8.1. Counting problems 341 


The following problem is a bit trickier, but it shows how formal series in 
several variables appear quite naturally. 


2. For positive integers m and n, let f (m,n) denote the number of n-tuples 
(£1, £2,-.-,2n) of integers such that |xj| + |z2| +--+ |£n| < m. Show 
that f (m,n) = f (n,m). 


Putnam 2005 


Proof. We will use a two-variable generating function in this problem: with 
the convention f(m,0) = f(0,n) = 1, we have the chain of equalities 


F(X,Y)= X f(mn)x™y¥"= X xy” X 1 


- X lkil++lkn| yn n 
LY es ne o (2) 


n>0 ky,...,knEZ keZ 
1 i 2X 1+X 
-br (i+ BE) =: TPA E) 
n>0 
1 


~ 1-X-Y—XY 
Since F(X,Y) = F(Y, X), it is clear that f(m,n) = f(n, m) for all m,n. 
O 


3. For n > 3 and A C {1,2,...,n}, say A is even if the sum of the elements 
of A is an even number. Otherwise, we say that A is odd. By convention, 
the empty set is even. 

a) Find the number of even, respectively odd subsets of {1,2,...,n}. 
b) Find the sum of the elements of the even, respectively odd subsets 


of {1,2,...,n}. 


Romanian TST 1994 


342 Chapter 8. Formal Series Revisited 


Proof. The generating function for the sums of the elements of the subsets of 
{1,2,...,n} is []_,(1 + XŻ), because of the obvious identity 


n 


jja +x =o ra, 


i=1 A 


where m(A) = J aeaa. Let E(n), respectively O(n) be the sets of even, 
respectively odd subsets of {1,2,...,n}. Taking X = —1 in the previous 
identity yields 0 = |E(n)|—|O(n)|. Since we clearly have |E(n)|+|O(n)| = 2”, 
we deduce that the number of odd, respectively even subsets is 2"—!. To answer 
the second question, we differentiate the identity, to get 


y iX! | [0 +X?) = N m(A) XAT 
A 


i=1 jżi 
Taking X = —1 yields 
XO m(A)= Š m(A) 
AEO(n) AE€E(n) 
On the other hand, taking X = 1 gives 
N m(A)+ m n(n + 1)2"-2. 
AEO(n) AEE(n) 


We deduce that the answer to the second question is n(n + 1)2”73 for both 
quantities involved. go 


Remark 8.2. Here is an alternative proof for the second part. 
If an = Acon) MA) and bn = }4cEin) MA), then 


An = An-1 + bn-1 +n: gn? bn = bn—1 +an-ı tn: gn-2 


for n odd, as the odd subsets of {1,2,...,n} are either odd subsets of 
{1,2,...,n—1} or the union of {n} and of an even subset of {1,2,...,n—1}. 
A similar analysis shows that 


Qn = 2an_-1 +n: gn—2 bn = 2bn-1 +n 2"? 


8.1. Counting problems 343 


for even n. By induction, we deduce that a, = 0b, for all n and then solving 
the previous recursive relation (which becomes an = 2an_1 +n- 2”) yields 
An = 2 n(n + 1). 


We continue with a very nice problem, though quite simple. 


4. How many polynomials P with coefficients 0, 1, 2, or 3 satisfy P(2) = n, 
where n is a given positive integer? 


Romanian TST 1994 


Proof. Let a, be the number of such polynomials. Then a, is also the number 
of solutions of the equation rp + 221+---+2,2" = n in integers z; € {0,1,2,3}, 
for varying k. Thus, the generating function is 


No an X” = I] (41 4X7 4 x22! + xe") 
n>0 k>0 


This is also equal to 


g 1—X4 1—x® 1— x16 


“1-X2  1-X 1-2 1- x4’ 
n>0 


which simplifies drastically to 


1 
> nX” = onn - 
3 (1— X)(1— X?) 
Now, we can decompose the last rational fraction in simple elements, following 
the standard procedure. A simple computation yields 


1o Ifa 12), 1 
(l1—xX)1—X2) 4\1-X 14+X/) 2XU1-X} 
Expanding this once more, we deduce that the answer is | 3 | +1. 


Note that we could have avoided the simple elements decomposition, be- 
cause OUA is simply the generating function for the number of ways 
to express n as a + 2b, where a,b are nonnegative integers. There are |3] +1 
possible choices for b, each of which leaves one choice for a. So there are 
|3] + 1 ways to express n as a + 2b and the conclusion follows. oO 


344 Chapter 8. Formal Series Revisited 


Remark 8.3. Here is another nice solution: if P(X) = ag + aX +---+a,X* 
satisfies P(2) = n, define 


f(P) =|] + |S] tea E] 


We obtain a map f from the set of polynomials P as in the statement of the 
problem, to values in {0,1,..., |z] }. It is easy to check that this map is a 
bijection. 

In the next two problems, one combines an easy combinatorial argument 
with a generating functions argument. Such mixtures appear very often in 
combinatorial problems and in this case generating functions do the dirty 
work of solving quite complicated recursive relations. Let us start with an 
absolute classic. 


5. In how many different ways can we parenthesize a non-associative prod- 
uct L12: En? 

Catalan 

Proof. Let an be the desired answer. Note that in a product of k, respectively 

n — k factors, we can put parentheses in a,x, respectively a,_, ways. Looking 

at the position at which the first parenthesis ends, we deduce the recursive 
relation a; = 1 and an = ar AE An—k: 

Next, define f(X) = 7,3; an X” and observe that the recursive relation 

implies the chain of equalities 


f(X) =X? +Y (S Akün- .) X” = f(X) - X, 


n>3 
which implies (taking into account that f(0) = 0) that 
1- vI- 4X 
2 
(1- 4X)? 


ESSE PE exe 


f(x) = 


vin mi = 
OTON 


8.1. Counting problems 345 


Now, 
1/2 _; 1-3--+-(2n—-83) 2 (2n-—2 
—j)?.4™ = (-1)"7!, 2 EA Syn gn a 2 
and so, using the previous relation yields a, = Len?) LJ 


6. Let F(n) be the number of functions f : {1,2,...,n} > {1,2,...,n} 
with the property that if i is in the range of f, then so is 7, for all 7 <7. 


Prove that pn 
F(n) — > 9k+1 ° 
k>0 


L. Lovasz, Miklos Schweitzer Competition 


Proof. The key point to obtain a recursive relation for the sequence F'(n) is to 
look at |f~1(1)|. If |f-1(1)| = j, then the j elements of f~!(1) can be chosen 
in (5) ways and f can be defined in F (n — j) ways on the remaining elements. 


Thus, there are (")F(n — 7) maps f satisfying the property in the statement 
of the problem and for which |f~+(1)| = j. Summing over all possible values 
of j, we cover all functions f satisfying the property in the statement, so 


n n—1 
n . n . 
F(n)=)_ ( Fo -j)=>> ( NEO) 

j=1 J j=0 J 
Here we took by convention F'(0) = 1. Considering the exponential generating 
function f(X) = X7 F(n) Xn the previous relation yields 


n=0 n! 


F(X) = 1+ (&* -1)f(X) = F(X) = sy. 


Next, we can write 


1 1 
-= 
21- 








nX 
ma 


e 
22 


NI. 


f(x) = 


346 Chapter 8. Formal Series Revisited 


Expanding now 
nX (nX? 
Siti toa 
and identifying coefficients in the previous equality yields the desired for- 
mula for F(n). Note! that we cheated a bit, since 2 n>0 © ee does not 
converge X-adically, but this can be easily fixed: consider the function 
f(z) = Po F) z”, defined in a neighborhood of 0 in C. This series con- 
verges absolutely and the previous computations show that f(z) = if z 
is close enough to 0. But then we can expand 


EE 


n>0 





ae ) 


where this time the series converges in C. Expanding e”’, collecting terms 
and identifying coefficients yields the result. oO 


We end this section with two more challenging problems. The first one is 
a very classical result in the theory of finite fields. 


7. Let N, be the number of irreducible monic polynomials of degree n with 
coefficients in Z/pZ. Then for all n we have $- djn dNqa = p" 


Proof. Write Fp = Z/pZ and consider the generating function 


`o Xess 


fEFp[X] 


the sum being taken over monic polynomials f € F [X]. As there are p” monic 
polynomials of degree n with coefficients in Fp, we have 





F= 2 p" X" = x 


n>0 


“We thank Richard Stong for pointing this out. 


8.1. Counting problems 347 


On the other hand, the unique factorization of monic polynomials into 
products of irreducible monic polynomials yields 


1 
p= jaa xh x8) = TT r 
h h 


the product being taken over the irreducible monic polynomials h. Taking the 
formal logarithm yields 








log J 


= 2108 7 yaer — = 2, Nn logy 


The desired formula is easily obtained from this equality and the classical 
expansion 





1 
] = —, LJ 
"6 T_X k 


Remark 8.4. Using Möbius’ inversion formula and the result of the previous 


problem, we obtain 
l n 
=h `S p(d)pa. 
d|n 
It is fairly easy to see from here that N, > 0, so there exists at least one 
irreducible polynomial of degree n over Fp. This result is absolutely not trivial. 


There is also an arithmetic solution of the previous problem, the idea being 
to prove the stronger result 


I] f=xP — 


feF,[X], 
deg f|n 


where the product is taken only over those irreducible monic polynomials 


f € Ep[X] whose degree divides n. 


8. Let x and y be noncommutative variables. Express in terms of n the 
constant term of the expression (x + y +27! +y71)”. 


M. Haiman, D. Richman, AMM 6458 


348 Chapter 8. Formal Series Revisited 


Proof. Note that variables cancel in variable/inverse pairs so the constant term 
is zero for n odd. Let am, be the number of products uju2---Un where each 
u; is one of {z,y,x~',y~'} and after cancellation we are left with a word of 
length m. Note that ao n is the desired constant term. By convention we set 
a—1n = 0. Starting from a word of length m > 0 there are three ways we 
can add one more term on the right and produce a word of length m+ 1 and 
one way we can add a cancelling term and produce a word of length m — 1. 
Therefore we have 


a i= 3Am—1,n + Qm+1,n M #1 
mnt 4ao n + 42.n m=1 


Letting An = J n-o dmnX™ we can rewrite this as 


An+1 = (3X + XT! )An + aon(X — X7’). 
This is a first order linear difference equation in A, and it can be solved by 
standard techniques to give 


n—l1 
An = (3X + X71)" + X` aon-m-1(X — X71)(3X + X71)". 


m=0 


By definition it is clear that A, is a polynomial in X, therefore the coefficient 
of any negative power of X in this series must be zero. In particular, setting 
n = 2s + 1 and looking at the coefficient of XT! gives 


2s +1 : 2m _1f 2m 
s — gm om l 
PEP) Dae [r Ca) (ea) 


m=0 


Forming generating functions of this identity gives 


ae (28 t1 s S m [2M m-1{ 2m m S r 
Sere] Sar 
s=0 m=0 r=0 

Let a(T) = $ 7—0 80,2r T". Recognizing the other sums via 


Zeo LW) = yar 


m=0 


8.2. Proving identities using generating functions 349 


2s+1 o 1l 2s +2 and 2m E 2m+ 1 2m 
s = 2\s+1 m—-1) — m m J’ 
we can write this as 


1 1 ( 1 1— 6T 1 ) ar). 


a ë — M i o5 — — r H —_ 
6TV1—12T 6T vy1-—-12TrT 18TV1-12T 18T 
Solving for a(T) and simplifying gives 
3 — 2f1—-12T-1 


a(T) = _—_—__——. = 
(1) 14+2/1—12T 1 — 16T 


Extracting the coefficient of T” using the binomial formula gives 


: 1/2 
a0, 2r = 2 `S z-a- ( / ) — 94r 


k=0 


_ ofr _ r 94(r—k)+29k (a — A 


— k k-1 


8.2 Proving identities using generating functions 


Generating functions are a very powerful method for proving combinato- 
rial identities. Namely, we compute the generating functions of both sides of 
the identity we are given and we prove that they are the same. Here are a few 
examples. 


9. Prove that for all positive integers n, 
n 
n+k-l1 
= F: 
D Ca ) = Pin 
k=1 


where Fn is the Fibonacci sequence (with Fy = Fy = 1). 
Iranian Olympiad 2008 


300 Chapter 8. Formal Series Revisited 


Proof. The generating function of the left-hand side is 
n Un (ntk-1 E n/n+k-—1 

dx D 2k—1 )-25x ( 2k—1 ) 

n>1 k=1 k>ln>k 
On the other hand, we have 

n+k-l1 k n+2k—1 k Lok 

n = X X” = — 

Delan) ME Caen era 


by the binomial formula. 
So the generating function of the left-hand side is 


X 
Dea- = ye 
1 X*“—3X +1 


It remains to compute the generating function of the sequence Fon. This is 
very easy, since we know that F, = =+, where z,y are the roots of the 


t—y ? 
equation t? — t — 1 = 0. Thus 
1 r? X y2 X 
X” Fon = —— | — 
>, ; —(73r- l-y ox) 
n>1 
E X(x? -y°) 
o (e—y)(1— 2X) — y?X) 
Since r+ y = 1, ry = —1, we easily deduce that 
(l—a?X)(1—y*X) = X?-3X41 
and the result follows. O 


10. Let n and k be positive integers. For any sequence of nonnegative in- 
tegers (a1, a2,...,ak) adding up to n, compute the product ajaq--- ax. 
Prove that the sum of all these products is 


n(n? — 12) (n? — 2?) ---(n* — (k — 1)?) 
(2k — 1)! l 


8.2. Proving identities using generating functions 301 


Proof. Let f(n,k) be the desired sum of products and consider the generating 
function 


g(X) = So f(n,k)X" =% XO am aX” 
n>0 n>0 a1+-:-+a,=n 
k 
= ` aj X™ e... ag X °* — ` XxX? 
Q1,Q2,...,a,>0 1>0 
xX k 
= (——___) = X101- Xy”. 
(aaa) =% 


Expanding (1 — X)~2* thanks to the binomial formula and using the previous 
relation, we deduce that 


f(n,k) = (-1)""# GZ _ s p! 


On the other hand, it is easy to check that 


n(n? — 1°) (n? — 2?) --- (n? — (k-1)}) /n+k-1 
(2k — 1)! -P 


from where the result follows. O 


Remark 8.5. Here are a few other such identities that can be easily proved 
using generating functions: 


a) Da, tapt--tbap=n a|Q9°::a, = Fon, where the sum is taken over all or- 
dered partitions of n and Fn is the nth Fibonacci number. 


b) ar tazt-tay=n(20 E 1)(2%27? E 1) TIT (22-1 — 1) = Fon-2. 


We continue with a very nice combinatorial identity, for which we give 
two proofs: a natural one using generating functions and a more subtle, purely 
combinatorial one. 


352 Chapter 8. Formal Series Revisited 


11. Let m,n be positive integers with m > n, and let S be the set of all 
sequences of positive integers (a1, @2,...,@,,) such that aı +a2+::-+an = 
m. Show that 


NO 102% .--n = So (-1)""* (e 
(a1,-.,an)ES i=l t 
Palmer Mebane, USA TST 2010 


Proof. The solution using generating functions is rather straightforward. In- 
deed, note that 


DEAN y ante) = NO X™(2X)%--- (nX)™ 


mn ait +an =m 


| 
M 
Ps 
vM 
S 
is 


On the other hand, we have 
ge (Ea) a) ge 


m>n i=1 
=\\(-)"* (") ey 


Thus, it is enough to prove that 


n! yn 


@— x) —2X)-(1—nX) =o") 1 -iXx’ 


But the theory of simple fractions decomposition shows that there exist 
A1, Ao,...,An such that 





n! “A; 
(1 — X)(1—2X)---(1—nX) = ax’ 


1=1 


8.2. Proving identities using generating functions 303 


Multiplying this by 1 — iX and taking X > 1 yields the expression of A; and 


a trivial computation shows that An = (—1)"7"*({)i. o 


Proof. Define k = m — n and let T be the set of sequences of nonnegative 
integers (bi, b2,...,bn) such that bı + b2 +--+ bn = k. The substitution 
b; = a; — 1 transforms the desired identity into 


n 
n! X` 19122... na = N (1) jh” ("), 
yee Samat 


j=l 


To prove this, we will count in two ways the colorings of k +.n objects with n 
colors such that each color is used at least once. 

The first method of counting is to line up the objects in some order and 
to consider the first appearance of each color. Suppose object c; is the first 
appearance of the ith color (in order of occurrence). Let bi = ci41 — Ci — 1 
(with Cn+1 = Nn + k + 1) be the number of objects between c; and cj41. Any 
choice of (b1, b2,...,bn) with bj + b2 +--- + bn = k also gives a valid choice 
of the c,’s via ci = i + iat b;. Now for the b; objects in between the first 
appearance of color t and color 7 + 1, there are 2 ways to color each one since 
only z colors have appeared so far. Therefore, for each choice of (b1, bo,... bn) 
the number of ways is 1°'2°2-.-n°". Summing over all possible cases and 
taking into account that the order the colors appear in can be rearranged in 
n! ways, we get n! o> 12192..." ways to color our objects. 

On the other hand, there are n*+” ways to color k + n objects with n 
colors and there are (a) (n — 1)**” ways to do the coloring with n — 1 of 
the n colors, (,",)(n — 2)*t" ways with at most n — 2 colors and so on. A 
standard inclusion exclusion argument shows that the number of admissible 
colorings is 


(7 kin — (i ) (n — j)jk+n pe (—1)"-? (z) oktn (-1)""! (") 


and we are done. Oo 


354 Chapter 8. Formal Series Revisited 


8.3 Recurrence relations 


Generating functions are an extremely powerful tool to deal with the 
complicated recurrence relations that appear quite often in enumerative com- 
binatorics. The reader is warned that the problems in this section are rather 
technical and difficult. 


12. Let A; = 0, Bı = {0} and 


Find all positive integers n such that Bn = {0}. 
Chinese Olympiad 


Proof. Using the relations given in the statement of the problem, it is imme- 
diate to check that 


1B,4,(k) =1p,(k)+1B,_,(k-—1) (mod 2), 


where 14(x) = 1 if x € A and 0 otherwise. This relation suggests considering 
the sequence of polynomials defined by 


Pyo=0, Pi=1, P(X) = Pr-1(X) +X Ph_o(X). 


Looking at the two recursive relations, it is clear that, modulo 2, the coefficient 
of Xf in P,(X) is simply 1g,(k). To solve this recursion, it is convenient to 
introduce a new variable T and set X = T + T?. To avoid confusion let 
Qn(T) = Pn(T + T?). Then this relation becomes 


Qn(T) = Qn-1(T) + T1 4+ T)Qn-2(T). 
This relation is easy to solve mod 2 giving 
Q,(T) = (T +1)” +T” € FAT]. 


It is obvious either from the original recursion or because Qn (T) = Qn(T +1) 
(mod 2), that Qn(T) can actually be written mod 2 as a polynomial P,(T'+T7). 


8.3. Recurrence relations 355 


Explicitly finding P, might be tedious, however we do not need to do so since 
we only care about when Bn = {0} or equivalently P,(X) = 1 or Qn(T) = 1. 
It is easy to see from the formula above that this occurs if and only if n is 
a power of 2. More explicitly, from Legendre’s formula it follows that the 
smallest k > 0 for which (7) is odd is k = 2°2("), Therefore Qn(T) has degree 
n — 2°2(") as a polynomial in T and P,(X) has degree 5 (n — 2v2(n)) as a 
polynomial in X. Therefore Q,,(T) = 1 if and only if n is a power of 2. oO 
Proof. As above, we consider the sequence of polynomials defined by Po = 0, 
P, = 1 and 
Pa( X) = Pa-1(X) + X Ph_-2(X) 


and we observe that, modulo 2, the coefficient of X* in P,(X) is simply 1p, (k). 
We consider the generating function P(X,Y) = }X po Pn(X)Y” and observe 
that 

(1—Y — XY*)P(X,Y) = P(X) + (P(X) — Po(X))Y 


+ So(Pa(X) = Po-i(X) — XPa-a(X))¥" =Y, 
n=2 


so we get 
Y 
P(X, Y) = 1—Y — XY? 

— `S y™+!(1 + XY)” 
m=0 

— ` `o (") Xiymtti 
m=0 7=0 J 

=F (" Jj ) xy. 
n=0 j J 


Thus the problem is reduced to finding all n such that ("2 =!) is even for 
all n > 7 > 0. We will prove that this happens if and only if n is a power of 2. 


356 Chapter 8. Formal Series Revisited 


Consider the representation of n in the form 02” where o is an odd number. 
If n is not a power of 2, then o > 3 and we have (2727!) is odd. If n is 
a power of 2, then we need to prove that (7717!) is even no matter which J 
we choose. Using the Legendre formula we are done if we can find an m for 


which not > | + wn . We can choose one more than the binary 


2M 
logarithm of the highest power of 2 that divides j. Thus the conditions are 
that n is a power of 2. O 


The following problem is very technical and its solution is too. But these 
kind of problems appear quite often in real life and we think it is rather 
important to be able to deal with them. 


13. Suppose that ag = a, = 1 and (n + 3)an41 = (2n + 3)an + 3nan-—ı for 
n > 1. Prove that all terms of this sequence are integers. 


KöMaL 


Proof. The first step is to consider the generating function 
F(X) = X nX". 
n>0 
Multiplying by X” the recursive relation and adding these relations yields 
S (n +3)an41 X” = X (2n + 3)anX" + S 3nan-1X”. 
n>1 n>1 n>1 


Now, since f'(X) = Žon>1 nan X”T!, it is very easy to express each of the 
previous sums in terms of f and f’. More precisely, a straightforward compu- 
tation yields 


2 


Yo (n+ B)ang1X" = F(X) + ZX) - 1) - 3, 
n>1 

X (2n + 3)anX” = 2X f'(X) + 3f(X) — 3 and 
n>1 


S 3nan-1X” = 3X f(X) + 3X? f'(X). 


n>1 


8.3. Recurrence relations 357 


Replacing these expressions in the previous equality and collecting similar 
terms, we end up with the differential equation 


f'(X)(X — 2X? — 3X9) + f(X)(2-— 3X - 3X?) -2=0. 


Now comes the technical part, which we will skip: solving this differential 
equation. There are standard methods to solve this, but unfortunately when 
one uses them in this case, one obtains rather horrendous expressions. Thus, 
we leave to the reader to convince himself that the resolution of this equation 


yields 
o1 — X — 1-AX-vl— 2X —3x* 2X — 3X* 


And now? It is absolutely not clear that the coefficients in the Taylor expan- 
sion of f are integers! The tricky point is to write 


(1— X —2X?f(X))* =1-2X - 3x? 
<=> 14(X —1)f(X) + X’ f(X =0. 


And now we are saved, because if we identify the coefficients in the previous 
relation, we obtain the following recursive relation 


An+4+2 = An4+1 + `S Akan—k- 
k=0 


And since the first terms are integers, the previous relation shows inductively 
that all terms of the sequence are integers. Another elegant way, found by 
Richard Stong, is to note that 


1— xX 4x? 
Me N l- | 
1 X 2k 
Drrli a 


SJE ra 1 2k n yn 
k+1\k 2k l 


358 Chapter 8. Formal Series Revisited 


and that mo (%) = (%) — (25) is an integer (this is the famous Catalan 
number). We leave as a challenge to the reader to prove directly from the 


recursive relation that 


n 


An+2 = An41 + `S Akün-k 
k=0 


> 1 (=) (2) 

An = —— . 

— k+1\k/\2k 

He will probably appreciate the power of generating functions (as far as we 
know, the proofs by induction are hideous, to say the least... ). O 


or that 


Remark 8.6. The numbers appearing in the previous problem are called 
Motzkin numbers and they have nice combinatorial interpretations. Here is 
one of them: a Motzkin path is a lattice path from (0,0) to (n,0) with steps 
(1,0), (1,1) and (1,—1), never going below the z-axis. Then a, is the number 
of Motzkin paths of length n. This is a beautiful and not obvious exercise that 
we leave to the reader. Another good exercise is to prove that a, is also the 
number of paths on N with n steps, each step being —1,0 or 1, starting and 
ending at 0. To make everything precise, let us recall that if A is a subset of 
Z”, then a lattice path of length l from x € Z” to y € Z”, with steps in A is 
a sequence vo = g£, V1,..., U1 = y of elements of Z” such that v; — vi—1 € A for 
all 2. 


We give three different solutions for the following beautiful problem. The 
first two proofs are a bit technical, but quite natural. The third one is very 
short and elegant, but not very natural. 


14. Consider (bn)n>1 a Sequence of integers such that b} = 0 and define 
a; = 0 and an = nbn + aibn-1 +--+ + Gn_ 61 for all n > 2. Prove that 
p|@p for any prime number p. 


KoMaL 


8.3. Recurrence relations 359 


Proof. Considering the generating functions 


A(X) = So nX", B(X) = So nX", 


n>2 n>2 
the recursive relation can also be written in the form 
A(X) = XB'(X) + A(X) B(X), 
from where we obtain the fundamental identity 


4x) = ZBO = 5 ogl- B00) 


Now, we have the following very useful result: 


Lemma 8.7. Any f € Z[[X]] with constant term 1 can be written in the form 


(1 — a; X$) 


g 


Il 
—_ 


f= 


i 
with a; € Z. 


Proof. Write 
f(X)=14+ AX + fox? +--- 


for some integers f;. Looking at the coefficient of X we obtain that a; = fi. 
In general, we find an in terms of a),...,@n—1 by imposing the condition that 


(1 —a,X)-+-(1—a,X")f(X)=1 (mod X”). 


This expresses a, as a polynomial with integer coefficients in aj,...,@n—1 and 
fi,---, fn, from where the conclusion follows immediately by induction. O 


Using this result, let us write 


(1 eX") 


e 


Il 
— 


1 — B(X) = 


2 


360 Chapter 8. Formal Series Revisited 


with c; € Z. Then 


=X dog 1 — ci X) 


i> 1 
E ; 1 -— GX’ 
1 
=- (ig X* + icf X* +--+). 
i>1l 


Since ap is the coefficient of X? in A(X), the only contribution comes from 
the terms with 7 = 1 and ¿ = p and since the contribution for i = p is clearly 
a multiple of p, it is enough to check that cı = 0. But this is clear, since 


1 — B(X) =1 (mod X?). oO 
Proof. Let us modify the definitions of the generating functions a bit, by 
imposing different initial terms. Namely, define ag = a; = 0, bọ = —1, b; = 0 
and x 
A(X) = 5a; X*, -y- —b, X’. 
i=0 


Then the recursive relation satisfied by the sequences implies that 
A(X)B(X) = -XB'(X), 


so that 








Integrating both sides we get 


= 


1=2 





= log (1 — (bp X? + b3 X’ +---)). 


Since 


8.4. Additive properties 361 


we deduce that 





ax -5 X + be L i 


og 


Il 
N 


1 i=1 


Looking at the coefficients of X? in both sides shows that S is the coefficient 


P 2 34... i , , , . . 
of X? in as (oa tba X tel Since the denominator of this coefficient is 
clearly relatively prime to p, it follows that plap. oO 


Proof. Let x1, £2, ..., Lp be the roots of the polynomial 
XP 4 by XPT! + ba XPT? + b3 XP’ +- + bp. 


Comparing the identities aj = bı and an = nbn + aibn-1 +:::+an-1ıbı for all 
n > 2 with Newton’s identities, we easily deduce that 


-a;i = r +H Hri 
for every i € {1,2,...,p}. On the other hand, bı = 0 implies that 
tı +T2 +: + Tp = 0. 


The desired result follows then from corollary 9.15. O 


8.4 Additive properties 


When studying additive properties of sets A C Z, it is very useful to 
consider the generating function f = $ aea X^. For instance, the square of f 
encodes the number of solutions to the equation a+b = n, with a,b € A. This 
rather innocent-looking observation yields quite a lot of nontrivial results and 
the purpose of this section is to present a few of them. 

Quite often, it is more convenient to reduce the generating function mod- 
ulo suitable primes, as this simplifies a lot the computations: it is a very 
fortunate feature of F, that (1+ X)? = 1 + X” in F,[X]. 


362 Chapter 8. Formal Series Revisited 


15. Let A be a finite set of nonnegative integers. Define a sequence of sets 
by: Ag = A and for all n > 0, an integer a is in An4+ı if and only if 
exactly one of the integers a — 1 and a is in Ay. Prove that for infinitely 
many positive integers k, A, is the union of A with the set of numbers 
of the form k +a witha E€ A. 


Putnam Competition 2000 


Proof. The key point is to note that the definition of An+1ı in terms of A, can 
be expressed algebraically by 


`S X° =(1+X) DORG 


for all n, the equality being in F2[X]. We deduce that 


NO X*=(14+X)">_ xX? 


a€An acA 


for all n. On the other hand, the condition that A, is the union of A and 
A +n can be also written (when n > max A) in the form 


So XS (14 X") $0 Xe. 


a€An acA 


Thus, it suffices to find infinitely many n such that (1 + X)” = 1+ X” in 
F2[X]. Simply take for n a power of 2 greater than max A and we will have 
A, =AU(A +n). O 


The following problem is an absolute classic. 


16. Prove that if we partition the set of nonnegative integers into a finite 
number of infinite arithmetic progressions, then there will be two of them 
having the same common difference. 


Proof. Suppose that N is partitioned into the arithmetic progressions a;N + b;, 
with a;, b; > 0O and a; < ag <--- < an. We will actually prove that an-1 = an. 


8.4. Additive properties 363 


Suppose that this is not the case and note that the partition hypothesis yields 
an identity of formal series 


l — - yaikto: — - X’ 
TTL LX = Dayar 


i=1 k>0 i=1 


However, the right-hand side has a pole at a primitive ath root of unity, while 


the left-hand side definitely does not have such a pole. oO 
The following problem is much trickier. 


17. Let p be a prime and let n > p and aj,a2,...,a, be integers. Define 
fo = l and fk the number of subsets B C {1,2,...,n} having k elements 
and such that p divides -jep 4i- Show that fo— fı + fo—---+(-1)"fn 
is a multiple of p. 


Saint Petersburg 2003 


Proof. Adding to all of the a;’s a suitable large multiple of p (which does 
not affect the hypothesis or the conclusion), we may assume that the a;’s are 
positive. The point is to consider the remainder of 


f(X) = (1 - X™)(1— X) (1 X) 


=X (1f $ XP € F,[X] 
k=0 BCA 
|B|=k 
modulo X? — 1, where m(B) is the sum of the elements of B. On the one 
hand, the hypothesis n > p and the fact that 1—_X divides 1 — X™ imply that 
1 — X?’ = (1 — X)? (remember that we are working in F,[X]) divides f(X). 
On the other hand, we have XY = X™ (mod X?’ — 1) if N = M (mod p), 
thus 


p—1 n 
H=% |Y `S 1| XÍ (mod X? — 1). 
j=0 \k=0 |B|=k,m(B)=j (mod p) 


Since this polynomial is 0, it follows that its constant term is 0 in Fp. But this 
is precisely saying that p divides fo — fi + f2 —--- + (—1)" fn. o 


364 Chapter 8. Formal Series Revisited 


The following is also a rather tricky problem. We found it in the wonderful 
little book [62]. It was also proposed in a Chinese Team Selection Test in 2002. 


18. For which positive integers n can we find real numbers aj,a2,..., Gn 


such that 
(lai a3] 1<i<jsm}={1,2,...,(2) be 


Proof. It is clear that if a; are such numbers, then all their differences are 
integers, thus we may actually assume that they are integers. Let 


F(X) = X% Xp Xn, 
so that the hypothesis can be written 


Do G+ x-0) 


The point is to look at values of f at points on the unit circle, since for 
|z| = 1 we have f (4) = f(z) = f(z), thus f(z) f (4) =|f(z)|? > 0. We deduce 
that for any z = e? we have 





n—-14+——__* >n. 
z— 1 


However, an easy computation shows that this is equivalent to 
sin(n? —n+1)2 


- > 0. 
sin 5 


n— 1+ 


We will take x such that (n? — n + 1)% = % to deduce that 
1 2 — 1 
5 (n?—n+1) | 


n-1>— 
sın In2—n+1) TT 


However, it is immediate to check that the last inequality cannot hold unless 
n < 4. And indeed, for n < 4 such sequences exist. For n = 1,2 things are 
clear, while for n = 3 and n = 4 one can take the sequences 1, 2, 4, respectively, 
1,2,5,7. oO 


8.4. Additive properties 365 


Proof. It is easy to build examples with n = 1,2,3, and 4. Suppose n > 5 and 
without loss of generality that aj < a2 < --- < an. Then the largest difference 
must be a, — a, = (5) and since 


n—1 n—l 
n 
An — a, = N (akyı — ak) > Sok = (3) 
k=1 k=1 


we see that the differences aķ+ı — ak for 1 < k < n — 1 must be a permutation 
of 1,2,...,n— 1. In particular, all differences of a;’s with indices two apart are 
at least n. Suppose ak+ı — ak = 1, then if they are present, both ak+42 — ak = 
1+ (ak+2 — Gp41) and apy, — ak-1 = 1+ (ak — ax_,) would be at most n, 
a contradiction since they are at least n and distinct. Thus the difference 
of 1 can only occur at one of the ends az — a; or an — an-ı and it must be 
adjacent to the difference of n — 1. Since this accounts for the difference of n 
all remaining differences of a;’s two apart must be at least n + 1. Therefore 
repeating the above argument shows that the difference of 2 must also occur 
at one of the ends and can only be adjacent to the difference of n — 1. But 
this is impossible since the ends are at least n — 1 > 4 apart. o 


The following gem is one of our favorite problems. 


19. Find all positive integers n with the following property: for any real 
numbers a1,4a2,...,@n, knowing the numbers a; + aj, i < j (but not 
knowing which number corresponds to which sum) determines the values 
Q),02,...,Q4, uniquely. 


Selfridge and Straus 


Proof. We will eventually prove that the answer is: all positive integers but 
the powers of 2. 

First, let us give a nice counter-example when n = 2k. Let Ay = {0,3} 
and Bı = {1,2} and define inductively 


Ajp = Aj U(277" + Bj), Bj+ı = Bj U (2?*" + Aj). 


It is an amusing exercise left to the reader to check that A;, B; have 2) elements 
and that 
{a + aga, # ag E Aj} = {bi + bald) # b2 € Bj} 


366 Chapter 8. Formal Series Revisited 


for all 7. Actually, A; and B; are precisely the sets of numbers having at most 
j+1 digits when written in base 2 and whose sum of digits is even (respectively 
odd). All this can be easily proved by induction and shows that Ax, By are a 
counter-example for n = 2°. 

Let us prove now that if n is not a power of 2 and if the collections 
(a; t+a;)ic; and (b; + b;)i<; are identical, then the collections (a;); and (b;); 
are identical. This is the hard part. We may always assume that a;,b; are 
positive real numbers, by adding to each of them the same large integer. 

Assume first that the a;’s and b;’s are integers and consider the polyno- 


mials 
n 


F(X) =D X", g(X)=9 X. 


Then the hypothesis implies the equality 
f(X)? — f(X*) = g(X)? — g( X°). 


If f = g, then we are done, so assume that h = f —g is not the zero polynomial. 
Write h(X) = (X — 1)*p(x) for some polynomial p with p(1) Æ 0 and some 
k > 1 (note that f(1) = g(1) =n). Then 


(X — 1)*p(X)(f(X) + 9(X)) = (X? — 1)*p(X?), 
which, after division by (X — 1)*, can be written as 
p(X)(f(X) + 9(X)) = (X + 1)%p(X"). 


Taking X = 1 in this identity and dividing by p(1) 4 0, we obtain that 
n = 2*-1, Since this is a contradiction, it follows that f = g and the result is 
proved for integers. It follows trivially that the result also holds for rational 
numbers (by multiplying them by a suitable positive integer we can make all 
of them integers). 

In order to prove the general case, we will use Dirichlet’s approximation 
theorem. Suppose that n is not a power of 2 and that the two collections of 
numbers (a; + aj)i<j and (b; + bj)i<j are the same. Pick any e < 4. 

By Dirichlet’s approximation theorem, we can find integers p, qi, ri such 
that p Æ 0, |pa; — q| < £ and |pb; — r;| < £ for all 1 < i < n. By the triangle 


8.4. Additive properties 367 


inequality, any equality a; + a; = b; + b; forces qi + qj = ri +r; (indeed, we 
have 

qi + qj — (Ti+ rj) = qi — pai + qj — paj — (ri — pbi) — (rj — pbj) 
and each term has magnitude smaller than 1/4). Hence the two collections 
(qi + qj)i<j and (ri + rj)i<j are the same and so (by what has already been 
done) we have an equality of polynomials | [/_,(X -— qi) = []/_,(X -r:). This 
can also be written as 


[I (x -ai + =e) =J] (x -n P. 
i=1 p i=] p 

By taking € smaller and smaller, we obtain sequences un, Un which converge 
to 0 and such that Į [;— (X -ai + un) = []j_,(X -bi + vn) for all n. It is clear 
that this forces | [;-;(X — ai) = [[{_,(X — bi), which is the desired result. O 


The following alternative proof is a very elegant argument due to Selfridge 
and Straus ([70]): 


Proof. Assume that n is not a power of 2. Let us prove that the knowledge of 
the multiset (a; + a;);<j uniquely determines the symmetric elementary sums 
of the a,’s (this in turns uniquely determines the multiset (a;);, yielding the 
desired result). Using Newton’s formulae, it is enough to prove that we can 
recover the sums Sp = a* + ak +... + a® from the multiset (a; + a;)j<j, and 
this for every k (it would be enough to take k < n). Note that 


S (ai + aj)" = ` (a; + aj) — 2* Si, 


ij 1<i,j<n 


n k 
— `S `S (i) af'al — 28 S, 
l=0 


ij=l l= 
k 


= (2) Sk-1Sı — 2" Sp 
l=0 


k—1 
k 
= (2n — 25) Sp + ` (5) Sks. 


l=1 


368 Chapter 8. Formal Series Revisited 


As 2n—2* Æ 0 for all k, the previous relations show (by induction on k) that S% 
can be uniquely determined as a linear combination of the sums ));,, (a; +a;)* 
and So, .51,...,5%-1. This is precisely what we needed. 0 


The last two problems of this chapter are very hard. The first one requires 
a few facts about polynomials over finite fields, for which we refer the reader 
to the addendum 9.A. 


20. Prove that there exists a subset S of {1,2,...,n} such that 0,1,2,...,.n—1 
all have an odd number of representations as x — y with xz, y € S, if and 
only if 2n — 1 has a multiple of the form 2- 4* — 1. 


Miklos Schweitzer Competition 


Proof. Define f(X) = Jles X s—l. The fact that S satisfies the property in 
the statement is equivalent to the following equality in F2[X] 


XX) (=) HEX peep X 


Indeed, the coefficient of X° in f(X)f (x) is the number of representations of 
a as a difference of two elements of S. Conversely, if we can find a polynomial 
f € F2[X] satisfying the previous equality, we can define a set S with the 
desired property simply by saying that s € S if and only if X*°~! has coefficient 
1 in f. 

Thus, we must find n for which one can find f € F2[X] satisfying the 
previous relation. A first crucial remark is that 1+ X +---+ X2"-? has no 
multiple root in the algebraic closure of F2. Indeed, this is already the case 
with X2"-! — 1, since this polynomial is relatively prime to its derivative. 
Next, the roots of f are precisely the inverses of the roots of X"~!f (+). We 
deduce that no irreducible factor of 1 + X +---+ X?”-2 can vanish at z and 
1/z for any root zof 1+X+.-- +X ?n-2. But conversely, if this property 
holds, we can find such f. Indeed, we can then pair the irreducible factors of 
1+ X +--+ X2"-* such that in each pair the roots of the first polynomial 
are the inverses of the roots of the second polynomial. One can then take for 
f the product of the first components of these pairs. 


8.4. Additive properties 369 


Now, consider the permutation z > z? of the roots of X2"—! — 1. 

If C),...,C, are the cycles of this permutation, the irreducible factors of 
X?” — 1 are the polynomials [],¢¢,(X — z). We deduce that the existence 
of S is equivalent to: for any root z #1 of X?"~! — 1, there is no k such that 
2 = 1/z. This is equivalent to (X?"~! — 1, X2°+1 _ 1)=X —1 forallk >0 
and so therefore gcd(2n — 1,2" + 1) = 1 for all k > 1. Hence the least s such 
that 2n — 1|2° — 1 is odd and writing s = 2k + 1 we are done. O 


Rather strong analytic skills are required to prove the following result, 
which we found in chapter III of the excellent book [62]. It is there considered 
an “appetizer.” 


21. Let A be an infinite set of positive integers. Let x, be the number of 
pairs (a,b) € A x A such that a < b and a+b = n. Prove that the 
sequence (£n)n is not eventually constant. 


Donald J. Newman 


Proof. Consider the generating function f(X) = J aea X°. Note that f(z) 
converges for all |z| < 1 and z > f(z) is a continuous function on the open unit 
disk. Moreover, the hypothesis that x, is eventually constant is equivalent to 
the existence of a constant c and of a polynomial P such that 
F(X} — f(X?) = —— + P(X). 

We will prove that this cannot happen. The idea is to look at the behavior 
of f on the real axis, close to 1 and then at its average behavior on circles of 
radius tending to 1. To avoid introducing too many functions, we will write 
f >> g (when f,g are positive functions on (0,1)) if there exists a constant 
c > 0 such that f(r) > cg(r) for all r in a neighborhood of 1. In particular, 
we have |P| << 1 and since f(x?) > 0 for 0 < x < 1, we deduce that 
f(x)? > y$; + P(x) and so f(x) >> Ws So f grows quite fast on the real 
axis, near 1. 

Now we integrate the relation satisfied by f on circles of radius close to 
1. Using the triangle inequality, we deduce that for all r € (0, 1) 

1 20 


2n 
fre) dr < — | |f(re)|de + sup |P(z)| + cF (0), 


2T 0 — 2r 0 jz|<1 


370 Chapter 8. Formal Series Revisited 


where F(r) = = f EA Using Parseval’s identity we obtain 


1 2n ; , 
L [T joda = Sor = Fr 
0 acA 


On the other hand, Cauchy-Schwarz inequality and Parseval’s identity yield 


1 , 

an Jo "Iede < =f u (re) Pde = VF) < VF(r?). 
T 

Putting these remarks together, we obtain the estimate: 


r?) < y f(r?) +C1 + CoF(r) 


for some constants C1, Cz > 0 independent of r. This immediately yields 
f(r?) << 1+ F(r). If we combine this with the result established in the first 
paragraph, we deduce that << 1+ F(r). It remains to prove that this 
is not the case. Note that for 0 < x < m we have 





|1 — re’™|? = (1 — r}? + 4r sin? (x/2) > (1 — r}? + 4rr?/nr? 


1 F dz 
P= =f A 
0 

1 T 


| dx 
T Jo (1 — r)? + 4rr?/r? 


1 
= ——arcsinh avr 
2\/r 1 — rT 


It follows that lim„—1- att ) z Í 3 and hence 


lim Vl—rF(r) =0. O 


r>17 


hence 








Remark 8.8. With basically the same techniques, but with more work, one 
can prove the following beautiful theorem of Erdős and Fuchs: 


8.5. Miscellaneous problems 371 


Theorem 8.9. Let A be a set of positive integers and let r(n) be the number 
of pairs (a,b) with a,b E€ A such thata + b= n. Suppose that for some € > 0 


we have A 
=C +0 (— ) 
34e 


For the origin of this result and a complete proof, see [62]. 


r(O) +r(1) +---+7r(n) 
n+1 
for some constant C. Then C = 0. 





8.5 Miscellaneous problems 


The solution of the following problem is quite short, but the problem is 
actually quite tricky. 


22. Is it possible to partition the set of all 12-digit numbers (leading zeroes 
are allowed) into groups of four numbers such that the numbers in each 
group have the same digits in 11 places and four consecutive digits in 
the remaining place? 


St. Petersburg Olympiad 


Proof. The answer is negative. Assume that we found such a partition and 
let A be the set of all 12-digit numbers. Consider f(X) = F e4 X°'”, where 
s(a) is the sum of digits of a, and let Gi,...,G, be the groups appearing in 
the partition. By the condition imposed on the structure of each group we 
deduce that aeg; X sla) is a multiple of 1 + X + X? + X? for any i. Thus 


14+ X4+X74 X3f(xX -EE xe. 


i=l a€G; 


But we can actually compute f in a closed form, since 


— 9\12 _ xt) — É 


Since this is clearly not a multiple of 1 + X + X? + X? (for instance, because 
it does not vanish at 2), the problem is solved. 0O 


372 Chapter 8. Formal Series Revisited 


The following problems are quite challenging. They establish congruences 
using generating functions. The first one is taken from [64]. See also [77], [78], 
[79] for other references and delicate congruences. 

23. Let p be a prime and let d € {0,1,...,p}. Prove that 


p—1 


2 ‘(er a) =r (mod p), 


where r= p—d ‘nod 3) ), r E€ {-1,0, 1}. 
H. Pan, Z.W. Sun 
Proof. The key point is to observe that 


ORGE 


E p—l1 1 2k E (x44)? -1 
s= Shia) “S (x+y) = Oe 


Now, since we are only interested in S (mod p), it is enough to understand 
the previous rational function taken mod p. Since in F,((X)) we have 


1\7 1 \? 
X — XP — 
( +x) =( +35) 


we deduce that S (mod p) is the coefficient of X~?4 in XA € F,((X)). 
So S (mod p) is also the coefficient of X~¢ 

XP+ X?P+1 

X+X'+1 

X(1-— X) 


= ee (XP + XP $1) 


SO 


— (X? + XTP + 1) NO XOH — NO xot? 


n>0 n>0 


By inspection of the last product, the result follows. O 


8.5. Miscellaneous problems 373 


The following problem is similar in nature to problem 23, but more com- 
plicated. 


24. Let p > 3 bea prime. Prove that 
p*-1 
2k 
`S (F =0 (mod p°). 
k=1 
David Callan, AMM 11292 


Proof. Let S = D (7*), so we need to prove that S = 1 (mod p°). The 
first key point is to observe that 





p?—1 On? 2 
E 1 + X)? — xP 
S=|X° x-* xy = x+? XP 
X SX XY = ley ead) 
thus 
22 2p? 2 
5 = per TO eo CEE = X) 
E 1— x3 
1 x 22 2p? 
— 2—1] + k 
= |X? DOLS 
k=0 


The second key ingredient is the following nice 


Lemma 8.10. For all 1 < k < p? we have 


O) ete 


Proof. The left-hand side is the coefficient of X* in (1 + X 2p But since 
(1 +X)? = 1+ X” (mod p) (this follows by raising to the p-th power the 
equality (1+ X)? = 1 + XP (mod p)), we can write 

(1+ X) =1+X” +pA(X) 


for some A € Z|X]. The lemma follows then by taking the square of the 
previous relation and comparing the coefficients of X*. O 


374 Chapter 8. Formal Series Revisited 


Coming back to the proof, we deduce that S — 1 is the same (mod pĉ) as 
twice the coefficient of X?°~! in 


p?— 2 
_ P k+3j 
Der (it-00 D (pe 
p?>k>1,j>0 
which is precisely 
2 2 
— P p 
s=- © A) E G) 
1<k<p?,3|k 0<k<p?—1,3|k+1 


We need to prove that Sı is a multiple of p?. Well, the good news is that 
Sı = 0, since if z =e a , then 


2 


P? 7 2\ „k-1 _ „1—k -1 2 -1 
— 27 (1+z) —2(1 p 
=> (p a -1- a 


k -lz 
k=0 
On the other hand, since p? = 1 (mod 6), we have 
1+2) =+, (1+2) =14z, 


which combined with the previous relation shows that Sı = 0. The result 
follows. g 


The following result is very difficult. It was conjectured by Rodriguez- 


Villegas and proved in a rather difficult way by E. Mortenson. The following 
beautiful proof is due to R. Tauraso, taken from [83]. 


25. Prove that for any p > 2 we have 


p-1 2 
1 (2k -1 
(4) =(-1)'? (mod p°). 


k=0 


8.6. Notes 375 


Proof. Observe first of all that 
1 /2k\ Mi- (2 - 1) 
16t \ kJ —— 4k. (2k)! 


e Mj- - (23 - 1)") 
4k . (2k)! 


— (—1)* en (mod p°), 


where n = pot 


We will prove that for all nonnegative integers n we have 
n 
„{2k\/n+k 
_4\k — (_1)n 
Seve) ("ye ) =a" 
k=0 


by computing the generating function of the left-hand side. We have 


(rene) (x) oder) oe Cn) 


n>0 \k=0 k>0 
=D (ie) 2x Coe) = DONG) aaa 
k>0 k>0 
1 
2k x Nf 1 -X \72 
—<—_ ee |14. ——— 
TE ) (aap) wee (4 aR) 
k>0 
1] + TS 
Combining the previous two paragraphs yields the desired result. o 
8.6 Notes 


The following people provided solutions for the problems in this chapter: 
Robin Chapman (problem 24), Prasad Chebolu (problem 14), Darij Grinberg 


376 Chapter 8. Formal Series Revisited 


(problem 14), Xiangyi Huang (problems 18, 22), Mitchell Lee (problems 4, 10, 
12), Palmer Mebane (problem 11), Greg Martin (problem 2), Fedja Nazarov 
(problem 21), Fedor Petrov (problem 23), Richard Stong (problems 8, 12, 18), 
Qiaochu Yuan (problems 1, 6, 16), Victor Wang (problem 12). 


Addendum 8.A Lagrange’s Inversion 
Theorem 


In many counting problems, we start by finding a recursive relation for 
the number of objects we are trying to count, then we consider the generating 
function of that sequence (or the exponential generating function, according to 
the context) and establish a functional equation for it, based on the recursive 
relation. For instance, when counting the number of ways to put parantheses 
in a product of n terms, the associated generating function turns out to satisfy 
the easy functional equation f(z) = 1+ zf(z)?. Of course, in this case one 
can solve this quadratic equation and find a formula for f(z), which in turn 
allows us to find the desired number of ways (which is the classical nth Catalan 
number). But what if the equation was f(z) =1+zf(z)°? Then surely such a 
method would not work, simply because it is impossible to solve this equation 
using radicals. Of course, one might argue that such an equation is unlikely 
to come up in enumerative combinatorics, but this is also wrong! 

The purpose of this addendum is to present a basic and very powerful 
tool for dealing with such problems (and many more), the Lagrange inversion 
formula. After discussing the very beautiful proof of this result, we will turn 
to applications, which will hopefully show its power. 


8.A.1 Statement and proof of Lagrange’s inversion formula 


Before stating Lagrange’s inversion formula, let us recall a very basic 
result on compositional inverses. 


Proposition 8.A.1. Let K be a field and let F(T) = a,T+aoT?+--- € K{[T]] 
be a formal series with a, # 0. Then there exists a unique f € K|[T]] such 
that F(f(T)) =T and it also satisfies f(F(T)) =T. 


Proof. Let us look for solutions of the equation F(f(T)) =T having the form 
f(T) = bT + bT? +--- (it is clear that if f is a solution, then f has zero 
constant term). By comparing the coefficient of T on both sides, we obtain 
bi = a. Doing the same for the coefficient of T? yields a,b + aob? = 0 
and, in general, looking at the coefficient of T” we obtain an equation of the 


378 Chapter 8. Formal Series Revisited 


form abn + Gn(a1,a2,...,an, b1,- --,bn-1) = 0 for some polynomial Gn. This 
already shows that if the solution exists, then it is unique. But it also shows 
that a solution exists, since the previous equations can be solved recursively. 

In order to prove that f(F(T)) = T, observe that f has the same property 
as F (namely its constant term vanishes and its linear term is nonzero). So, 
by the previous paragraph there is a unique g such that f(g(T)) = T. But 
then F(T) = F(f(g(T))) = g(T) and so F = g and f(F(T)) =T. o 


We are now ready to state and prove the main theoretical result of this 
addendum. The proof given here (due to Hofbauer) is an adaptation of an 
analytic proof that would work over complex numbers. However, using the 
purely algebraic notion of residue of a Laurent series, one can obtain a proof 
over any field of characteristic zero. Recall that if f € KI[T]], then [T”] f(T) 
denotes the coefħcient of T” in f. 


Theorem 8.A.2. (Lagrange’s inversion formula) Let K be a field of charac- 
teristic 0 and let e be a series in K|[T]] whose constant term is nonzero. If 
f € K|[T]] satisfies f(T) = Te(f(T)), then for any g € K[[T]] and any n > 0 
we have 


IT" Jal F(T) = —[T""V(a Del)". 





Proof. Let F(T) = AT" so F(T) =a-T+--- forsomea € K*, as the constant 
term of e is nonzero. The hypothesis f(T) = Te(f(T)) becomes F(f(T)) =T. 
Thus f is the (compositional) inverse of F and so f(F(T)) = T, too. If 
A(T) = X nez anT” € K|[T]][1/T] is a Laurent series with coefficients in K 
(so an = 0 for all n small enough), let res(hdT) = a_;. Obviously, for all h 
we have res(h’dT) = 0. The key point is the following “change of variable 


formula” 


Lemma 8.A.3. If G € K|[T]][1/T], we have 
res(G(T)dT) = res(G(F(T))F'(T)dT). 


Proof. By linearity, we may assume that G(T) = T*, with k € Z. If k £ —-1, 
both G(T) and G(F(T))F'(T) are of the form g’ with g € K[[T]][1/T] and 


8.A. Lagrange’s Inversion Theorem 379 


so both terms of the equality we want to establish are zero. So, assume that 
k = —1. Then? 











F(T) 1 e(T) 1 e(T)\" 
(los Say) 


SO res (Fara) = res (2) and we are done again. go 


Coming back to the proof of the theorem and using the lemma, we can 
write 





rer) = (ar) 


— res Goa 


eee Far) 


= —<res(g(T)(F(T)")'d7). 


As wv + uv = (uv)’, we have res(u’v) = —res(uv’) for all u,v, which combined 
with the previous equality yields 





ITT) = Eres (7 ar) 





(T) 
= -res (Aerar) 
= tT" (g'(T)e(T)”AdT) 
The result follows. o 


Here is an easy application of the inversion formula, which is not so easy 
to prove by other means: 


e(T) _ (-1)""' eT) N" 
smD (aay) 


exists in K[[T]], as 4) — 1 € TK[[T]]. 


?Note that 





380 Chapter 8. Formal Series Revisited 


Example 8.A.4. Suppose that two sequences (an)n, (bn)n of complex numbers 


satisfy 
k 
bn = ) l 7 .) ak 


k 
for all n > 0. Prove that 


_l 2 Cre an p yar kbe 


for all n > 1. 


Proof. Let A(X) = dins9 @nX” and B(X) = D059 bnX” be the generating 
functions of the two sequences. An easy computation shows that 


= yx" (>: Ge) = Dax Y (E eM 
-~ DUD (H =$ a X*(14 X)* = A(X + X?). 
k n=0 k 


Let Y = X + X?, so that X = f (Y) with f(Y) =Y- 
inversion formula, we obtain 


an = [¥Y"](A(Y)) = [¥"](B(X)) = YB EY) 


“[y"(B(Y\(1+¥)-") = S y" nyt + Y)7”) 
k 


DD a CES ARED DE OD 
mn nn k 


and the result follows from the easily checked equality 


C n z 


ix TW): Using Lagrange’s 


8.A. Lagrange’s Inversion Theorem 381 


8.A.2 Two variations and some applications 


In applications, one also encounters the following versions of the inversion 
formula. For instance, we will use the following result to prove a very nice 
combinatorial identity due to Abel. 


Theorem 8.A.5. Let K be a field of characteristic 0 and let e be a series in 
K|[T]] whose constant term is nonzero. Then for all f € K|[T]] we have 


fT) =10)+ (a) FEIET) 


Proof. Let X = X(T) be a formal series such that X = Te(X) (it exists by 
proposition 8.A.1 applied to F(T) = AT) Then Lagrange’s inversion formula 
yields 


FXT) = FO) + ETITI = O+S =F" (T)e(T)")-T" 


n>1 n>1 
Now, substitute T = Y/e(Y) to obtain the desired result. o 


Here is the promised application, which is really not easy to prove by 
direct computational means. 


Theorem 8.A.6. (Abel’s identity) For all complex numbers a,x,y and all 
positive integers n, 


n 


(+y =>). HEG + ak)! (y — ak). 


k=0 
Proof. The desired equality is equivalent to 
(+y) _ 3 a(x + ak)! (y — ak)” 
ni = k! (n-k) ` 


Let us consider the generating functions of the two sides of the previous equal- 
ity. The generating function of the left-hand side is e‘*+¥)?. The generating 


382 Chapter 8. Formal Series Revisited 


function of the right-hand side is 


k-1 .\n—k . .\k-1 
y y netah ‘(a ee a) (y ak)" on Soret + ak)" Tly-ak) 


n O<k<n (n — ki)! k k! 
Thus, by dividing by e¥7, it suffices to prove that 
=] + Di = x( r+ ak) Tle TTE. 
k>1 
But this is a consequence of theorem 8.A.5 with 
f(T) =e" and e(T) =e”. o 
Let us apply this result to prove the following nice-looking identity. 
Example 8.A.7. Prove that for all n > 1 we have 
y (") (i+ DIH 417! = 2n + 2). 
i j20,i+j=n 
AMM E. 2828 


Proof. Take a = 1, x = 1 and y = n + 1 in the previous theorem. We obtain 
n \ a i-1l/; j n 

`S ("a+ ij +1) = (n +2)". 

wtj=n : 

Unfortunately, this is not really what we want to prove, but it shows that we 

are on the right track. To get rid of the extra 1 in the exponent of j + 1, 


differentiate with respect to y the equality in the previous theorem and then 
take a = 1, xz = 1 and y = n + 1. This time we end up with 


n-1 _ n\/. i-ls/. j-1 
n(n + 2) => ("Jn jj +1) 
itj=n 
Now, observing that 7 = j + 1 — 1, we can write the last sum as 
n\,. i-1/; j my). i-1/. j-l 
p2 ("a+ (j +1) p2 ("Jey (j +1) 
wtjHn wtyj=n 


Combining this with the first relation, the result follows. O 


8.A. Lagrange’s Inversion Theorem 383 


Here is a quite exotic problem: suppose that e € K[|[T]] and consider the 
sequence an = [T"](e"(T)). What is the generating function of the sequence 
an? The following result answers this question in a more general context: 


Theorem 8.A.8. Let X, Y be variables satisfying Y = Xe(Y), where e € 
K|[T]] has nonzero constant term. Then for any F € K||T]] we have 


F(Y) 
T"|(F(T)e(T)") - X" = ———__.. 
DIT" FC)e(1)") i xe) 
n>0 
Proof. Apply theorem 8.A.5 to f(T) = fo T aay du (recall that this is formal 





integration, so f(T) is the unique formai series vanishing at 0 and such that 
f'(uje(u) = F(u)). Using that f’ = =, we obtain 


f= > Lear (PVR (T)e(TY) 


n>1 


= Oar r(D)e(ry") x". 


n>1 


Differentiating this equality with respect to Y we obtain 





F(Y) d B n- n-i yn-14X 
ai) = Lt METT D0 


Finally, differentiating Y = Xe(Y) we obtain dY = dX -e(Y)+ Xe'(Y)dY, so 


dX 1-—Xe'(Y) 
aY — e(Y) 


Replacing this in the previous equality yields the desired equality. LJ 


Example 8.A.9. Find a closed form for the generating function of the sequence 
an, where an is the constant term of (1+ X + 4)”. 


384 Chapter 8. Formal Series Revisited 


Proof. Note that an = [T"](e"(T)), where e(T) = T? +T +1. So, by the 
previous theorem 


1 
nX” = ; 
de 1— Xe(Y) 


n 


where Y = Xe(Y). Solving the equation in Y yields 


1—X-—v1l-2X — 3x2 


Y= 
KO 


and then an easy computation shows that 


1 
a X” = — g 
dia J1 — 2X — 3X? 


n 


8.4.3 Examples from enumerative combinatorics 


In this section we consider applications of the inversion formula in count- 
ing problems. We start with an absolutely classical and beautiful theorem of 
Cayley, but we need a series of definitions before stating it. Recall that a tree 
is a connected graph with no cycle. A labeled tree on the set {1,2,...,n} is 
a tree whose set of vertices is {1,2,...,n}. A rooted tree is a tree in which 
one of the vertices (called the root) is distinguished. There is a unique (non- 
backtracking) path between any two vertices of a tree. The parent of a vertex 
in a rooted tree is the vertex connected to it on the unique path to the root. 
A child of a vertex v is a vertex whose parent is v. A tree is called ordered if 
one is given an ordering of the children of each vertex. 


Theorem 8.A.10. There are n”? unordered labeled trees on the set 


{1,2,...,n}. 


Proof. Let an be the number of unordered rooted labeled trees on {1,2,...,n}, 
with the natural convention that ag = 0. It is enough to prove that a, = n”—!, 
as clearly the number of unordered labeled trees is a,,/n. 

The point is that giving such a tree is the same as giving its root and a 
forest of subtrees whose roots are the children of the root. Suppose that the 


root has k children and that the corresponding subtrees have n1, n2,..., Nk 


8.A. Lagrange’s Inversion Theorem 385 


vertices, so ni +n +---+nk =n—1. The number of ways to distribute the 
n — 1 vertices different from the root in these k subtrees is 


n—1 n—-l-n = (n=!) 
nı no o malno! eng! 
Once such a distribution is made, we have an,đnz'''an, ways to label the 
elements of the forest and so the contribution to the total number is 


(n — 1)! 
oo an Ano ttan. 
nlna! ng! ee k 


The total contribution coming from all partitions of n — 1 is 
(n-— 1)! 
Fintan Om 


nilnao!---ng! 
nyt--tnp=n—-1 12 k 


however each configuration is counted k! times (since we do not care about 
the order of the children of the root) and the main root can be chosen in n 
ways. We finally deduce the very complicated recurrence relation 


1 (n — 1)! 
man) ` nilng.. ngl mane One: 


nyt:-+tnp=n-l 


This simplifies drastically if we consider the exponential generating function 
T(X) = din>0 an Žr, since 


[X"T(X)F = 5 Am i Ane 
nit -+ne=n-] ni! Np! 


Hence the recurrence relation can also be written 


= _ [x7] y rxy _ [XP NHeT(X) 
k 


that is Xe7(*) =T (X). Using Lagrange’s inversion formula, we obtain 


a nr! 
n 


An = [X"|(T(X)) = —[X"He"* = 





, 


n! 


from where the result follows. gO 


386 Chapter 8. Formal Series Revisited 


We end this addendum with two more difficult examples. The following 
beautiful result is taken from [65]. 


Example 8.A.11. An intransitive tree on the set of vertices {1,2,...,n} is a tree 
such that for all 1 <i<j< k< n, {i,j} and {j,k} are not simultaneously 
edges. Prove that the number of such trees is 


1 r /n 
OO O OO k78. 
n-Qnr-l 2 (o) 


Note that it is absolutely not clear that the above quantity is an integer! 


Proof. Let Fn be the number of such trees and let 


D 


n>0 


be an associated exponential generating function. Call a vertex i left if all of 
its neighbors are greater than z. Let Ln be the number of rooted intransitive 
trees on the set of vertices {1,2,...,n}, whose root is a left vertex. Then 
Lı = 1 and we clearly have Ln = 5 Fn for n > 2 (n comes from the choice of 
the root, division by 2 takes into account the fact that the probability that 


the root is left is 1/2). If 
T?” 
L(T) = ) Ln — 


‘51 n! 

is the exponential generating function associated to Ln, then Ln = 5Fhn for 
n > 2 yields L(T) = F(1 + F(T)). But the exponential formula implies that 
F(T) = et), as an intransitive tree on the set of vertices {1,2,...,n + 1} is 
obtained from a forest of left-rooted trees on {1,2,...,n}, by connecting n+1 
to each root. Thus, we obtain F(T) = eF UFF), _ If f(T) = T(1 + F(T)), 
we deduce that f(T) = T(1 + ef(T)/2). An application of Lagrange’s inversion 
formula yields 


= — [T"] f(T) tpr- 1y( l+e T/2)n pr- In; ekT/2. 





(n 


8.A. Lagrange’s Inversion Theorem 387 


kT/2 and collect terms according to the exponent of T, we finally 


1 ~ fn n-l 
Pom ot (p) : : 


Finally, a question by James Propp with a nice proof from [69]. 


If we expand e 
deduce that 


Example 8.A.12. The vertices of a polygon P with N + 2 vertices are labeled 


1,2,1,2,...in order (stopping when the end is reached). Let ay be the number 
of triangulations of P with no monochromatic triangle. Then ay = oT ($7) 


if N = 2n and ay = ra (3741) if N=2n +1. 





Proof. Define ag = 1. Suppose that N = 2n+1 and call a triangulation proper 
if it contains no monochromatic triangle. Consider a proper triangulation 7 
of P. Note that P has an edge labeled 1,1. This edge must be a side of a 
triangle with a vertex labeled 2. If this vertex is the ith vertex labeled 2, 
with ¿ > 0, then the two sides of the triangle split P into a 2i + 2-gon and a 
2n — 2i + 2-gon and both of these polygons are properly triangulated by 7. By 
adding over all 7 we obtain a2zn+1 = X; a2;â2n—2i; and a similar argument 
for N even yields aon = en  aidon—1-i. This suggests considering the two 
generating functions 


A(X) =X anX", B(X) = So aang. X”. 


n>1 n>0 
Then the previous relations can be written in a compact form 
A(X) = 2X(14+ A(X))B(X), B(X) = (14 A(X))?. 
Indeed, we have 
a-ge -5 (a) 


n>0 n>0 \i=0 
2 


= S > a2iX" = (1+ A(X))? 


i>0 


388 Chapter 8. Formal Series Revisited 


and 
2n-1 2n+1 
A(X) = ` ` amimi) Xx" =X. ` (5 stan] X” 
n>1 \ i=0 n>0 \ i=0 
=X.: © naman) X™ + X > © cantar X” 
n>0 \i=0 n>0 \i=0 


= 2X(1+ A(X))B(X). 


We deduce that A(X) = 2X(1+.A(X))? and an easy application of Lagrange’s 
inversion formula finishes the proof. go 


8.4.4 Composition of generating functions 


One of the key points in the proof of Cayley’s theorem 8.A.10 is to estab- 
lish the functional equation Xe!(*) = T(X ) for the exponential generating 
function of the number of labeled trees. We would like to give a more ab- 
stract and general context for this kind of argument, which appears very often 
in counting problems. We follow rather closely the wonderful book [76] and 
we strongly advise the reader to take a look at the first chapter of it, which 
contains an impressive number of examples and problems on this topic. 

Let K be a field of characteristic 0 and let f.g: N — K be two sequences 
of elements of K. Let Ey = 2 on>0 f (n) & be the generating function of f. We 
would like to give a combinatorial interpretation of the generating functions 
Eş Eg and Byo Ey. Note that Ey - Eg = Enr, where 


hin) = Y (PJ Os - k). 


k=0 
We deduce that for any finite set X we have 
A(IX|) = D7 F(ISIGIX-—SI)= $ FAS gllS2), 
SCX (5 ,S2) 
the second sum being taken over all ordered partitions (S1, S2) of X (and Sj, 


S2 may be empty). By an obvious induction, we deduce that 


Es, Epo Es, = En, 


s 


8.A. Lagrange’s Inversion Theorem 389 


where 


RIX|))= S> AUSAD fa(|S2l) +--+ fo(Ssl), 


the sum being taken again over ordered partitions of X. For instance, con- 
sider the problem of counting the number of partitions (S),S2,...,S,) of 
{1,2,...,n}, where each S; is nonempty. By taking f(n) = 1 ifn > 1 and 
f(0) = 0, we deduce that the exponential generating function for the number 
of such partitions is 


k 
BY(X) = e*t = Dy (E) e 


j=0 


and expanding e/* yields the desired number of partitions. 
Suppose now that f(0) = 0. We would like to understand the generating 
function Eg 0 Ep. 


Theorem 8.A.13. Let f,g : N — K be sequences such that f(0) = 0 and 
g(0) = 1. Then E,o Ey = En, where h : N > K is a sequence such that 
h(0) = 1 and for all finite sets X we have 


h(|X|) = 29h) F (Sil) - F(lS2l) -e FASE), 


the sum being taken over all unordered partitions (S1, S2,..., Sk) (with arbi- 
trary k) of X into nonempty subsets. 


Proof. We clearly have h = } >o 9(k)hk, where 


= VO (S11) (Sel). 


the sum being taken over unordered partitions with k classes. Hence it is 


E} 
enough to prove that En, = qr. But this follows from the previous discussion 
and the fact that we are only considering unordered partitions here (thus the k 
classes may be permuted in k! ways and yield the same unordered partition). UJ 


390 Chapter 8. Formal Series Revisited 


Example 8.A.14. Let us consider again the problem of finding the functional 
equation for the exponential generating function of the number of unordered 
rooted trees on {1,2,...,n}. Let T be this generating function. Then by the 
previous theorem Xe!(*) is the generating function for the number of pairs 
(r, F), where r is a root and F is a forest of unordered rooted trees starting 
from this root. But it is clear that any unordered rooted tree arises in this 
way, so we actually have Xe7(*) = T(X). 


Example 8.A.15. Suppose that we want to count rooted unordered labeled 
trees such that the number of children of each node is in a fixed set S, con- 
taining 0. The argument used in the previous example yields the functional 
equation 





f(X)=X > 


sES 


F(X) 
— 


3 


Using Lagrange’s inversion formula, we obtain a formula for the number of 
such trees. 


Example 8.A.16. Let E be the generating function for the number of connected 
graphs with vertices 1,2,...,n. Giving a graph with vertices 1,2,...,n is the 
same as giving a family of disjoint connected graphs (its connected compo- 
nents), thus by theorem 8.A.13 the generating function for the graphs with 


vertices 1,2,...,n is e”. But since there are 2(2) such graphs, we deduce that 
ny X” 
n>0 


Example 8.A.17. Consider two sequences f,g such that f(0) = 0, g(0) = 1 
and define h(0) = 1 and 


h(\X|)= YS > glk) FAC) - F(\Cal) +--+ FUCel), 


aeéSym(X) 


where Sym(X) is the set of permutations of X and Ci, C2,...,Ck are the 


8.A. Lagrange’s Inversion Theorem 391 


cycles of ø. Then theorem 8.A.13 yields? 


F(X) = By | So f(r) — 


n>1 


Example 8.4.18. For nonnegative integers c),Co,..., let an(c1,c2,...) be the 
number of permutations o € Sn having c; cycles of length i for all i < n. 
Consider indeterminates X1, X2,.... Then the previous example yields the 
following cycle-index formula 


T” T” 
Cc Cc n — 
` an (c1, C2,- ) XP XS +++ Xr y= exp N Xn — 
n>0 n>1 
C1,C2,...,Cn 20 
Example 8.A.19. Let us count the number of permutations of odd order, i.e. 
for which all cycles have odd length. By taking X; = 1 when i is odd and 0 
otherwise, we deduce that the exponential generating function for this counting 
problem is 


Eş = exp `S I 


n 
n>1 
n odd 
log(1 +T) — log(1 — 
ya +T) — log( 2) 


2 
1+7 


3Note that we also have 


h(|X1) = Lath) FASSI — DE FUSe)USel — 1)! 


the sum being taken over unordered partitions m of X, since the cycle decomposition of a 
permutation yields a partition of X and since one can cyclically permute in (|5;| — 1)! ways 
the elements of a class with |S;| elements. 


392 Chapter 8. Formal Series Revisited 


and an easy application of the binomial formula yields the number of permuta- 
tions f(n) = (1-3-----(n—1))* when n is even and f(n) = (1-3--+--(n—2))?-n 
when n is odd. 

Example 8.A.20. Let k be a positive integer and let f(n) be the number of 
permutations o € Sn such that o* = 1. This is equivalent to the fact that the 
length of each cycle of ø divides k. Thus by the previous examples 


xd 
Eş = exp ` PE 
d|k 


8.A.5 More tree-counting problems 


In this section we present another proof of Cayley’s theorem as well as 
some similar counting problems, all related to trees. The following general 
result is quite useful in problems concerning trees. 


Theorem 8.A.21. Let dı, d2,...,dn be positive integers such that dı + d2 + 
--» +d, = 2n — 2. Then the number of trees on the set {v1, v2,..., Un} such 


. n—2)! 
that vertex vi has degree di is IESE RSTn 


Proof. We will prove the result by induction on n. We may assume that 
dn = 1, by permuting the d;’s if necessary. Consider a tree on {v1, v2,..., Un} 
such that deg(v;) = d; and remove vertex v, and the unique edge whose 
endpoint is v,. We obtain a tree on {v1,v2,...,Un—1} whose degrees are 
dy,...,d;-1,d; — 1,dj44,...,dn—1 if vn is connected to vj. Conversely, any 
such tree on {v1,...,Un—1} yields a tree on {v1,..., Vn} simply by connecting 
Un With v;. It follows that there are 


— (n — 3)! 


n—-1 
= -n (n3) — (n-2)!_ 
È k r) [](dk — 1)! [] (de - 1)! 


such trees and the result follows. O 


8.A. Lagrange’s Inversion Theorem 393 


Cayley’s theorem is a direct consequence of the previous theorem and of 


the multinomial formula: the number of trees on {1,2,..., n} is 
— )l..... — yl Fale gol. wee it 
dy tent n? (dı — 1)! (d, — 1)! apon tlt’ ia! in! 
>1 


tw 


(1 +14+---4+1)"?% =n’, 


Proposition 8.A.22. There are (2-2) -(n—1)"—*—! labeled trees with vertices 
1,2,...,n in which vertex 1 has degree k. 


Proof. This is also an easy consequence of the previous theorem. The desired 
number of trees is 


(n — 2)! m=? o 
D aoup a h) en 


k+d2+---+dn=2(n—1) 


the second equality being a consequence of the multinomial formula. O 


Let us introduce a very useful notion in graph theory. 


Definition 8.A.23. Let G be a loopless graph. A spanning forest is a sub- 
graph without cycles and having the same vertices as G. A spanning tree is a 
connected spanning forest. 


Here is a nice application of Abel’s identity. 


Example 8.A.24. There are (n — 2) : n”? spanning trees of K, which do not 
contain a fixed edge of Ky. 


Proof. Call 1,2,...,n the vertices of Kn and assume without loss of generality 
that the fixed edge is e = 12. Let f(n) be the number of spanning trees that 
contain e. Such a tree appears uniquely as a result of the following process: 
consider two trees Ti, Tə whose vertices form a partition of 1,2,...,n with 1 
a vertex of Tı and 2 a vertex of To, and then join these two trees by the edge 


e. If Ti has k vertices different from 1, these vertices can be chosen in (7?) 


394 Chapter 8. Formal Series Revisited 


ways. Once these vertices are chosen, we have (k + 1)*-} possibilities for Tı 
and (n — k — 1)"-*-8 possibilities for T2. Thus 


n—2 
fim) => (T JEDE (n= =a = 28 


k 
k=0 


the last equality being an easy consequence of example 8.A.7. The result 
follows. O 


Remark 8.A.25. Here is another approach, suggested by Richard Stong. Let 
Xe be the probability that a randomly chosen spanning subtree of Kn contains 
the edge e. Then by symmetry it is clear that E[Xe] is the same for all edges 
e. Since any spanning subtree of Kn has exactly n — 1 edges we have 


S E[Xe] =n-—l. 


Therefore since all (3) terms in this sum are equal, 


-1 2 
EX] = —— =<. 


(>) n 


Hence by Cayley’s formula, there are 2-n"~° spanning subtrees containing e 
and (n — 2)n”~% that do not contain e. 


Chapter 9 


A Little Introduction to 
Algebraic Number Theory 


This rather long chapter is concerned with elementary algebraic number 
theory. The techniques are rather diverse: basic linear algebra, algebraic num- 
bers and symmetric polynomials, cyclotomy and p-adic analysis are some of 
the topics discussed in this chapter. Since we will use the notion of algebraic 
number quite often in this chapter, we end this introduction with a few rec- 
ollections. For more details and some proofs, the reader is referred to the 
addendum 9.B. 

A complex number z is called algebraic if it is root of some nonzero poly- 
nomial with rational coefficients. In this case, there exists a unique monic poly- 
nomial with rational coefficients, called the minimal polynomial of z, which 
vanishes at z and has minimal degree. The roots of this polynomial are called 
the conjugates of z. The crucial property of the minimal polynomial is that 
it is irreducible over Q and divides any polynomial with rational coefficients 
that vanishes at z. A fundamental theorem in algebraic number theory states 
that the algebraic numbers form an algebraically closed subfield of Q, thus an 
algebraic closure of Q. If z is an algebraic number, we let Q(z) (or Q[z]) be 
the subfield of C generated by z. It consists of all numbers of the form f(z), 
with f € Q[X] (or equivalently f € Q(X)). This is a finite extension of Q, 
of degree equal to the degree of the minimal polynomial of z. The primitive 


396 Chapter 9. A Little Introduction to Algebraic Number Theory 


element theorem ensures that all finite extensions of Q are of the form Q(z) 
for some algebraic number z. We call such extensions number fields. We will 
frequently use the notation [L : K] to denote the dimension of L as K-vector 
space, as well as the fundamental tower relation [M : K] = [M : L]-|L: K] 
for any finite extensions M/L/K. 

A more refined notion is that of algebraic integer. This is a complex 
number that is killed by some monic polynomial with integer coefficients. By 
Gauss’ lemma, we can characterize algebraic integers as those algebraic num- 
bers whose minimal polynomial has integer coefficients. An easy but funda- 
mental result is that a rational number which is also an algebraic integer is 
necessarily a rational integer. Another important result is that the algebraic 
integers form a subring of the field of algebraic numbers. 


9.1 Tools from linear algebra 


In this section we consider a few applications of linear algebra to num- 
ber theory. These concern especially divisibility issues and linear diophantine 
equations. 


1. Let a,b,c be relatively prime nonzero integers. Prove that for any rela- 
tively prime integers u, v, w satisfying au-+buv+cw = 0, there are integers 


m,n, p such that 


a = nw — pv, b = pu — mw, c= mv — nu. 


Octavian Stanasila, Romanian TST 1989 
Proof. Consider the linear system in the variables m, n, p 
a = nw — pv, b = pu — mw, c= mv — nu. 


Trivially, the determinant of this system is 0 and the rank of its associated 
matrix is 2. It is thus enough to solve in integers the system a = nw — pv, 
b = pu—mw. This system has integer solutions if and only if there is an integer 
p such that vp = —a (mod w) and up = b (mod w). Now, the hypothesis 


9.1. Tools from linear algebra 397 


implies the existence of integers A,B,C such that Au + Bv + Cw = 1. We 
deduce that Aua + Bva =a (mod w). Since au+bv = 0 (mod w), we deduce 
that (Ab — Ba)v = —a (mod w), so that we can take p = Ab — Ba to get 
up = b (mod w). Also, we can immediately check that 


u(Ab — Ba) = Aub — Bua = b(Au+ Bv)=b (mod w). o 


Proof. We will actually prove a stronger result: for any integers a, b, c and 
any integers u, v, w such that au + bv + cw = 0 and gcd (u,v, w) = 1, there 
exist integers A, B, C such that a = Bw — Cv, b = Cu — Aw and c = Av — Bu. 
Indeed, since gcd(u, v, w) = 1, a standard application of Bézout’s lemma 
yields the existence of integers X,Y, Z such that Xu + Yv + Zw = 1. Let us 
define 
A=cY —bZ, B=aZ-cX, C=bX - ay. 


Then, 


Bw — Cu = (aZ — cX)w — (bX — aY ) v 
=a(Xu + Yv + Zw) -— X (au + bv + cw) 


= a. 


Thus a = Bw — Cv and similarly b = Cu — Aw and c= Av — Bu. The result 
follows. G 


. . . a,—-Qa; - . 
A very nice and classical result is that [];<i<j<n i7 is an integer for 


any integers a|,@2,...,@,. There are many proofs of this result, at least two 
of them being presented in [3]. The following problem is a variation on this 
topic. 


2. Prove that for any integers a), a2,...,@n, the number 
lcm(a1,a2,..., an) 
r I (i) 
1⁄2 n 1<i<j<n 


is an integer divisible by 1!2!-.- (n — 2)!. Moreover, we cannot replace 
1!2!.--(m — 2)! by any other multiple of 1!2!---(n — 2)!. 


398 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proof. Consider the matrix A = {aij }i<ij<n with aj; = (S) for j > 2 and 


ail = a; 


. We will prove that 


_ lem(ai,a2,.--,4n)  Thi<icj<n(ai — ai) 


t(A 
det(A) a1Q2°:'Qn 1!2!-.. (n — 2)! 


Since the entries of A are integers, it is clear that its determinant is an integer, 
from which the first part of the problem will follow. 


Factoring an L = lem(qj,a2,...,@,) out of the first column and multi- 
plying the i-th row by a;, it follows that 
L -1 -1 -1 
g ' vl Un) l a alai) o oa (=) 
aa l (T) or RY) E L 1 az ag(az — 1) a2 (79) 
Bob : | aaan]: : 
a 1 (T) e (G) l an an(an—1) => anha) 


Taking out the numbers i that appear at the denominators of the binomials 
in each column shows that 


1 a a? — a, ato! foes 

L 1 ag ax — ao ay +... 
det A = ——________——__ . l l 
a102: anl!2!.- (n — 2)! 

l an az — ün qn! pe 


Note that the (i,j) entry of this matrix can be written as P;(a;), where Pj, 
0< j < n-—1 is a monic polynomial of degree j. For each column j add 
a suitable linear combination of the previous columns to reduce the previous 
determinant to 


2 n—1 
l a, a ay 
n— 
L l ag a5 Ay 
Q,°*+Anl!2!---(n—2)! |: 
2 n—-1 
l an a; An 


Since the last determinant is Vandermonde, the identity stated above is proved. 


9.1. Tools from linear algebra 399 


To see that the result is optimal, simply choose an = (n!)* and a; = (n!)? + i 
for 1< i< n. Then 


lem(a1,a2,.. ., an) = nina, ++ an- 
because the numbers an, %,i = 1,2,...,n — 1 are pairwise relatively prime. 
The result follows easily from this. O 


Remark 9.1. Quantities such as [],- j “4 and the one in the previous prob- 
lem have natural combinatorial interpretations: they are the dimensions of 
some irreducible representations of special unitary groups. Of course, explain- 
ing this is beyond the scope of this modest book, but the reader should know 
that these are not “just some random problems.” 


Remark 9.2. A similar result is proved in the beautiful paper [6]: if 


are i 2 2) + . 21.4! (2n)! 
Q0, Q1,- , An are integers, then []j<;(aj — a5) is a multiple of Sr and 
this result is optimal. This is also related to the dimension of some irreducible 


representations of the symplectic group. 


We continue with a nice application of linear-algebraic arguments. The 
ideas used in the following solutions are very useful in other contexts, too. 


3. Let p be a prime and let aj, a2,..., @p41 be real numbers such that no 
matter how we eliminate one of them, the remaining numbers can be 
divided into at least two nonempty pairwise disjoint subsets each having 
the same arithmetic mean. Prove that a) = a2 = --- = ap41. 


Marius Radulescu, Romanian TST 1994 


Proof. Subtracting from the a,;’s their arithmetic mean (observe that the new 
numbers have the same property), we may assume that a}+a2+--:+a@p41 = 0. 
Fix some 1 < 7 < p+1 and let C),...,C, be the classes of a partition of 
{a1,-.-,@p41} — {a;}, such that Ten 2 zec, Z does not depend on l. Since the 
sum of the a;’s is zero and since 


2 2 t= 45, 


l «rec 


400 Chapter 9. A Little Introduction to Algebraic Number Theory 


we deduce that we have a linear relation of the form 


l an te tora, + 
— ada; eee — . — 
[Ci] (Ci Il p 


= 0, 
with IC} | < p. 

Now, consider all such linear relations, obtained by making j run over all 
1,2,...,p +1. This gives us a linear system with p + 1 equations and p + 1 
unknowns (the a;’s), whose matrix has 5 on the main diagonal and numbers of 
the form Ł with k < p elsewhere. But then the determinant of the matrix will 
be of the form —4, for some rational m = 1 (mod p). Thus, the determinant 
is nonzero and since the system is homogeneous, the only solution is the trivial 
one. This implies that all a;’s are zero and the conclusion follows. O 


Proof. First, we will reduce the problem to the case when all a; are integers. 
The following method is classical and very useful in a whole variety of situa- 
tions: consider the vector space spanned over Q by the a;’s. This is a finite 
dimensional Q-vector space and if we take a basis of it and write each a; as 
a linear combination with rational coefficients of the elements of the basis, 
we easily see that the coordinates of the a;’s also satisfy the conditions of 
the problem (because by definition the elements of the basis are linearly in- 
dependent over Q). Working coordinate by coordinate reduces therefore the 
problem to the case when all a; are rational. Multiplying all a;’s by N! for 
some sufficiently large N reduces then the problem to the case when all a; are 
integers. 

Assume now that all a; are integers and let us prove the result by induction 
on max |a;|. The base case is obvious, so let us focus on the inductive step. 
Removing every element a; gives sets S;; C {a1,@2,...@p41}— {ai} = Aj; non- 
empty, pairwise disjoint so that E Jore Si; T is independent of j, say equal 
to k. Then 


1 1 
o -k= — , 
|Si,1| >, 7 2 j1 [Sijl >, i 


TEUj#1 Sij 


9.1. Tools from linear algebra 401 


Let |S; 1| = m and observe that }-;4; [Si j| =p- m and 


(p-m) rr: =m S rt| => X r=0 (mod p) 


LES: rEA;—-Si1 rEA,; 


Summing over all choices of i we obtain 


p+1 
a;=r (mod p) = a=r (mod p)ViE€{1,2,...,p+1} 
i=1 
Thus, we can write a; = pb; + r for some integers b;. But clearly 
{b,,b2,...,bp41} also satisfy the conditions of the problem and moreover 
max |b;| < max|a,;|. By the inductive hypothesis all b;’s are equal and so 
all a;’s are equal LJ 


Proof. Here’s a “no formula” proof, which uses the same kind of argument, 
but replaces the choice of a basis in a vector space with an approximation 
argument: first, we reduce to the case when all a; are integers in the follow- 
ing way. By Dirichlet’s approximation theorem there exists a large integer 
M such that all Ma; are very close to some integer A;. The linear equations 
deduced from the fact that the a;’s satisfy the conditions of the problem be- 
come approximate linear equations for the A,’s. But there are finitely many 
such equations and each has rational coefficients. Thus, if at the beginning we 
ensured that Ma; are sufficiently close to the A;’s, the approximate equations 
in A; are actually exact. Thus the A;’s are integers satisfying the conditions 
of the problem. If we solved the problem over the integers, it follows that all 
A; are equal. But then any two a,’s are less than 2/M apart and since M is 
arbitrarily large, this implies that all a; are equal. 

Now, let us assume that the a;’s are integers. If we remove one number, 
the common rational arithmetic mean for the rest of the numbers cannot have 
p in the denominator, so the sum of all other numbers is rational with p in 
the numerator and, thereby, an integer divisible by p. Hence all numbers have 
the same remainder modulo p as their sum. Now continue as in the end of the 
previous solution. LJ 


402 Chapter 9. A Little Introduction to Algebraic Number Theory 


9.2 Cyclotomy 
There are y(n) primitive nth roots of unity, namely en where k is 
relatively prime to n. Hence the nth cyclotomic polynomial 
2ink 
dn(X)= |] (X-e) 


1<k<n 
gcd(k,n)=1 





2.7 


has degree y(n). The splitting field Q(e™ ) of n is called the nth cyclotomic 
extension of Q. These polynomials and their splitting fields play a very im- 
portant role in many areas of mathematics and gave rise to a whole series of 
very deep results. Their study would require a whole book by itself, so we 
decided to focus only on some very elementary and classical applications. 

Since any nth root of unity in C is primitive of order d for a unique d|n, 
we get the: 


Proposition 9.3. (Fundamental identity) We have 


X” —1 = | [| ¢a(X). 


d|n 


This easily implies (by strong induction) that ¢,(X) € Z[X] for all n. 
The following result is not trivial and plays an important role in many proofs 
concerning cyclotomic polynomials. We’ll also see that a weak form of Dirich- 
let’s theorem follows very easily from it. For a proof of Dirichlet’s theorem in 
full generality, see addendum 7.A. 


Theorem 9.4. Leta be an integer and let p be a prime divisor of ọn(a). Then 
either the order of a modulo p is n (and sop=1 (mod n)) or p divides n. 


Proof. By assumption, p divides ¢,,(a), and so if a has order k (mod p) then 
k|n. If k < n, then p divides both af — 1 and a (the second because of the 


fundamental identity and the fact that p divides ¢,(a)). As gcd (at —1, art | 


n 


divides ? by the Euclidean algorithm, p|n and we’re done. O 





Remark 9.5. Note that the proof also works for prime powers p. 


9.2. Cyclotomy 403 


Theorem 9.6. For all n there are infinitely many primes p = 1 (mod n). 


Proof. For k > n large enough (it is actually enough to take k > 2) we have 
g,(k!) > 1 and so we can choose some px|¢n(k!). Since ¢,(0) is 1 or —1, we 
have $n(k!) = 1,—1 (mod k!), which obviously implies that gcd(p,,k!) = 1. 
As k > n we get pk > k > n and by the previous theorem we deduce that 
pk =1 (mod n). The result follows. oO 


For the next three problems we will use a very useful rationality result: if 
r and cos(rr) are both rational numbers, then cos(rr) € {+1, +4,0}. Let us 
recall the argument: 2cos(rr) = e" + e`" and the numbers e’’",e~*" are 
algebraic integers (they are roots of unity), so 2cos(rr) is an algebraic integer. 
Thus, if it is rational, it has to be a rational integer and the result follows. 
Before passing to the next problem, let us discuss a beautiful consequence of 
the previous observation. We will prove that the only regular n-gons all of 
whose vertices are lattice points are the squares. Indeed, let A, B,C be three 
consecutive vertices of the polygon and observe that 


l+cos# „2m (AB? + BC? — AC?) 


2 =S a 4AB? . BC? EQ. 


Using the previous observation, the result follows easily. We strongly advise 
the reader to look for a geometric proof in order to appreciate the power of 
algebraic numbers! 


4. Let A, B,C be lattice points such that the angles of triangle ABC are 
rational multiples of 7. Prove that triangle ABC is right and isosceles. 


Proof. Note that any angle 0 = ZABC with A, B, and C lattice points must 
have tan @ rational or infinity. To see this note that all lines between lattice 
points have rational or infinite slopes and if tana and tan are rational (or 


infinite) then so is tan(a — 8) = eee This implies that 


1 — cos 20 
tan” 6 = sec? 6 — 1 = ————_ 
an sec 1 + cos 20 


is rational and hence cos20 is rational. Combining this with the discussion 
preceding the problem shows that cos 2A, cos2B, cos 2C are all equal to +1, 


404 Chapter 9. A Little Introduction to Algebraic Number Theory 


+4 and 0. It is immediate to check that the condition tan A, tan B, tan C 
rational or infinite says A,B,C must be integer multiples of 7/4. Hence the 
only possibility is when ABC is right and isosceles. O 


5. Let a be a rational number with 0 < a < 1 and 


cos(3ra) + 2 cos(2ra) = 0. 


Prove that a = z, 


IMO Shortlist 1991 


Proof. Let x = cos ma and observe that the equation satisfied by a can be 
written as 


do? + 4r? — 32 -2 = 0 => (2r +1)(2x? +2 -— 2) =0. 


Of course, if x = —3, we must have a = 2 and we are done. The difficult point 
is to prove that we cannot have 2z? + + — 2 = 0. If this is the case, then z = 
=+ v17 because |z| < 1. We will then prove that cos(2”maæ) takes infinitely 
many values as n runs over the positive integers. This will clearly contradict 
the hypothesis that a is rational. But since cos(2"7a@) = 2cos?(2"~! ma) — 1, 
it is easy to prove that we can write 


An + bn V17 


a2 +17b2 — 8 
4 l 


bn+1 = Anbn, Qn+1 = 5 


cos(2" a7) = 
The previous relations yield by induction that an, bn are odd integers and that 
an+1 > Gn. Thus cos(2"7q) takes infinitely many values. O 


Remark 9.7. In general, let us choose relatively prime integers m,n with n > 2 
and find the degree of the algebraic number z = cos (22m). Define z =e" ,a 
primitive n-th root of unity. The irreducibility of the cyclotomic polynomials 
(which is a very nontrivial theorem) implies that z has degree y(n) as an 


algebraic number. On the other hand, we have 


(Q(z) : Q] = (Q(z) : Q(z)} - (Q(z) : Q] 





9.2. Cyclotomy 405 


and we have [Q(z) : Q(x)] = 2. Indeed, 2x = z + z7}, which implies that z 
satisfies a quadratic equation with coefficients in Q(x), so [Q(z) : Q(x)] < 2. 
On the other hand, we cannot have Q(z) = Q(z), because z is not a real 
number. Putting these observations together, we deduce that x has degree 
en) Using the previous result and the fact that cos (5 — x) = sin z, we can 
compute the degree of sin an The answer is a bit complicated: if n Æ 4, 
the degree of sin “7 is gln) if 8 divides n, vin) if gcd(n,8) = 4 and y(n) if 
gcd(n, 8) < 4. 


6. Prove that none of the numbers yn + 1 — yn for positive integers n can 
be the written in the form 2 cos (#*) for some integers k, m. 


Chinese Olympiad 


Proof. First, we will find a polynomial with z = yn +1 — yn as a root. 
We have x? = 2n + 1 — 2Vn? +n and so (x? — 2n — 1)? = 4n? + 4n, from 
where we easily find that x4 — 2(2n + 1)r? + 1 = 0. Note that the other roots 
of the polynomial f(X) = X4 — 2(2n + 1)X? + 1 are x = +/n+1lt vn. 
Next, we will find a polynomial with roots x = 2 cos oer Let Tm be the mth 
Chebyshev polynomial, defined by the equality Tm(cos z) = cos mz for all z. 
Then Tm ($) = cos 2km = 1. Thus the numbers 2cos #7 for k = 0,...,m — 1 
are roots of g(X) = Tm (+) — 1. These m numbers are not distinct, but 
2 COS ahr = 2 cos AM=N™ for 1 < k < m/2 are double roots of this polynomial 
since g achieves a local maximum at these points. Thus these are the only 
roots of g(X). 

If /n+1— yn =2cos ake , then f(X) and g(X) have a common factor 
in Z[X]. The only roots of f “which lie in the interval [—2, 2] (which contains 
all roots of g) are Vn + 1 — yn and yn — yn + 1. Therefore this common 
factor is either X — (vn +1 — yn) or X? — (vn + I — Jn)’. In either case 
we see that (/n + 1 — yn)? = 2n + 1 — 2,/n(n + 1) is an integer and hence 
n(n+1) is a square. But this would make 4n(n + 1) and (2n +1)? consecutive 
positive squares, a contradiction. O 


We continue with a very beautiful and classical result ([50]) concerning 
linear equations in roots of unity. 


406 Chapter 9. A Little Introduction to Algebraic Number Theory 


7. a) Suppose that a1,a9,...,@, are rational numbers and (, Co,....Cx 
are roots of unity such that a1¢; +@2¢o+---+ap¢, = 0. Moreover, 
suppose that -jez @i¢; # 0 for any proper subset I of {1,2,..., k}. 
Prove that Ç; = ¢;” for all 2,7, where m is the product of primes 
smaller than or equal to k. 


b) Let z be a complex number. Prove that there are at most 24k? BR k- 
tuples (&1,C2,...,Çk) of roots of unity with the following property: 
there exist rational numbers a, a2,...,aką such that z = ye aiGi 
and z # J ie; iG; for any proper subset I of {1,2,..., k}. 


Mann’s theoren) 


Proof. a) We may assume that a; = ¢; = 1. Let m be the least positive integer 
such that ¢/” = 1 for all 2 and choose a prime factor p of m. If m = pin with 
gcd(n, p) = 1, we will prove that j = 1 and p < k. This will imply that m 
divides I [p<k p and the first part of the theorem will follow. Proving this is 
however not a simple task. 


2i 
We start with an observation: let z =e and let Ç be an m-th root of 
unity. We claim that there exists 0 < r < p and wx such that x > = 1 and 


Ç = z’-g. This is very easy: if Ç =e Bint simply choose 0 < r < p such that 
rn =l (mod p). 

Applying this observation to each ¢;, we can write C= = zx; with 2,7; 
as above. We have xz; = 1 and rı = 0. The equation a 1 ligi = = 0 can be 


written an biz’, where bj = yor, = Uti. Note that b € Q(em m ). On the 
other hand, we can compute the degree of z over Q(e S ). Indeed, observe 
that Q(z,e-™) = Q(e™ ), so that 





™):Q) em) _ p) 
Qe): Q]) ¥lm/p) (pi) 
and the last quantity is p — 1 for 7 = 1 and p otherwise. 

Note that $ 7o l b X} is not the zero polynomial, since otherwise we obtain 


the relation n=l a,c; = = 0 for all 0 < l < p. But the hypothesis yields then 
{ilr; = l} = @ or {1,2,...,k} for all l. As rı = 0, this gives r; = 0 for all 2 


and so CP = 1 for all 2, contradicting the minimality of m. 


2imp 2imP a [Q(e 


[Qe m )(z): Qem )] = 





9.2. Cyclotomy 407 


If we combine the results of the previous two paragraphs, we see that we 
must have 7 = 1, as z is killed by the nonzero polynomial ar l b X!, of degree 


at most p — 1. But then z has degree p — 1 over Qem m ) (as follows from the 
previous computation) and so $ f_o l b X! is the minimal polynomial of z over 


Q(em m ). As z is also killed by 1+ xX +---+ XP?! we deduce that these two 
polynomials differ by a constant. In particular. all 6; are nonzero. So for all 
0<1< pone can find 7 such that r; = l. Clearly, this implies that p < k and 
the proof is finished. 

b) Fix a solution z = 5i a;¢; of the equation and consider another 
solution, say z = 5i b;z;. Thus 


k k 
X ați- Ý bizi = 


1=1 1=1 


but one has to be a little bit careful, as this relation does not necessarily satisfy 
the conditions of (a). However, if we fix 1 <i < k, we can find a minimal 
sub-relation of the previous relation which contains z;. By hypothesis, such a 
sub-relation must contain some ¢;. As the length of this sub-relation is at most 
2k and as it clearly satisfies the hypothesis of (a), we deduce that 27” = Cr 
for all ¢; in this sub-relation. Here m = []p<2x p. So, for any i, z; can take at 
most km values and so the number of solutions of the equation in z1, 22,..-., Zk 
is at most (km)*. It remains to use Erdés’s famous inequality (theorem 3.A.3) 
[Ip<n p < 4” to conclude. O 


Remark 9.8. Let a,,a2,...,@, be nonzero complex numbers and consider the 
equation azı +a9z9+---+adnzn = 1. A non-degenerate solution is an n-tuple 
(21, 22,---,2n) Of roots of unity which satisfies the equation and such that 
vier lizi É 0 for any nonempty subset J of {1,2,...,n}. Conway and Jones 
[20] improved Mann’s theorem by proving that if a; € Q, then for any non- 
degenerate solution we have z% = zł =--- = z4 = 1 where d is the product of 
primes p1,p2,---,ps such that 5°?_,(p; — 2) < n — 1. Also, in [30] the author 
proves using rather elementary and very beautiful arguments that there are 
at most (n + 1)3(r+1)* non-degenerate solutions of the equation. 


408 Chapter 9. A Little Introduction to Algebraic Number Theory 


9.3 The gcd trick 


The division algorithm shows that if K C L are fields and if f,g € K[X] 
are two polynomials, then their gcd is the same if we see f,g as polynomials 
with coefficients in K or with coefficients in L. That is, the greatest common 
divisor of two polynomials is not sensitive to the field in which the coefficients 
of these polynomials live. Combining this observation with Gauss’ lemma, we 
also obtain that if f and g are monic polynomials with integer coefficients, 
then their gcd computed in Q[X] has integer coefficients. This gives a very 
indirect, but sometimes very useful way to prove the rationality or integrality 
of a real number z: it is enough to exhibit X — zx as the gcd of two polynomials 
with rational coefficients (respectively of two monic polynomials with integer 
coefficients). The next problems in this section illustrate this trick. 


8. Let a,b be two positive rational numbers such that for some n > 2 the 
number {a + Vb is rational. Prove that %/a is also rational. 


Marius Cavachi, Gazeta Matematica 


Proof. Let us write ?/a+ Wb = c for some (positive) rational number c. Then 
v/a is a root of X” —a and also of (c— X)” — b. The key point is that it is the 
unique common root of these polynomials. Indeed, if z is a common root, then 
we can write z = Vaz; andc— z = bz for some nth roots of unity 21, 22. 
We deduce that Ya+ Vb = Yaz, + Vbzz. Since |z;| = 1, the real parts of 
Z|, 22 are at most 1. Passing to real parts in the previous equality then implies 
that z1 = z2 = 1 and the claim is proved. Now, since the two polynomials 
don’t have multiple roots, it follows that gcd( X” —a,(c— X)” — b) = X — Ya. 
The result follows now from the gcd trick. 0O 


9. Let m,n be relatively prime numbers and let x > 1 be a real number 
such that x™ + A and 2” + + are integers. Prove that x + 1 is also an 
integer. 





— 1 — 1 o a 
Proof. Let a= x + aa and b= x” + =; and consider the polynomials 


p(X) = X?™—aX™ 41 = (X™—27™)(X™ — r”) 


9.8. The gcd trick 409 


and 
q(X) = X” — bX" +1 = (X"—2)(X"— r”). 


The crucial claim is that 
ged(p,q) = X? — (£ +27!) X +1 = (X - z)(X - 17’). 


Assuming this for a moment, we can conclude that z + x7} is an integer by 
the gcd trick. 

It remains to establish the claim and for that it is enough to prove that x 
and x~! are the only common zeros of p,q (since clearly p,q have no multiple 
root). But if z is a common zero, we have z™ = 2” or z” = x—™ and similarly 
z” = x" or z” = x”. We may assume (by changing z and z7!) that z™ = 2™, 
so that |z| > 1. Then clearly we must have z” = x”. But then z/z is a root of 
unity whose order divides both m and n. Since gcd(m, n) = 1, it follows that 
z = x and we are done. O 


The following problem is very similar to the previous problem, but a bit 
more difficult. 


10. Let 0 € (0,7/2) be an angle such that cos® is irrational. Suppose that 
cos k@ and cos|(k + 1)6] are rational for some positive integer k. Prove 
that 6 = 7/6. 


USA TST 2007 


Proof. We will actually prove more: it is enough to replace k+1 by any integer 
l which is relatively prime to k. The key point is the following 


Lemma 9.9. If cosk@ and cosl@ are rational for relatively prime positive 
integers k,l, then either cos@ is rational or @ is a rational multiple of 7. 


Proof. If cos k0 = p and cosl@ = q, then e® is a common root of the polyno- 
mials 
f(X) =X —2pX" +1, g(X) = X” — 2X! +1. 


On the other hand, it is not difficult to check that if 8 is not a rational multiple 
of 7, then e® and e~*8 are the only common roots of f and g. Indeed, all roots 


410 Chapter 9. A Little Introduction to Algebraic Number Theory 


of f are et?9+ = for 0 < j < k and all roots of g are et?9+ TT with 0 <j <l. 

On the other hand, since gced(k,/) = 1, the only solution of the equation 
. 271] . 271 2 

ett = ett with 0 < ji < k and 0 < jp <1 (for some choices of 

signs) is J1 = J2 = 0. This proves that the greatest common divisor of f and g 

is precisely (X — e) (X —e7*) = X? —2cosOX +1, thus cos@ is rational. O 


Coming back to the proof, the previous lemina shows that @ is a rational 
multiple of 7. On the other hand, we saw in section 9.2 that the only rational 
numbers r € [0,1] such that cos rz is rational are r = 0, L, 5, 2 1. We deduce 
that kô and l0 are integer multiples of §. Since gcd(k,l) = 1, Bézout’s lemma 
implies that 0 is an integer multiple of §. Since cos @ is irrational, we deduce 


that @ = S. g 


9.4 The theorem of symmetric polynomials 


The proof of the following result is quite elementary, but the result it- 
self is incredibly powerful and useful. If R is a commutative ring and if 
f € RX, X2,..., Xn] is a polynomial, we say that f is symmetric if for 
all permutations o of {1,2,...,n} we have 


f(X, e. ,Xn) = f (Xe); cy Xg(n))- 


Recall that the fundamental symmetric polynomials are 


Ok = `o Xi Xiz t Xips 


1<i<i2 < <ik In 
for 1 < k < n. We have the equality 
(t+ X1)(t + X2) (t+ Xp) =t? Hoy t Hon € Rit, Xi,- Xn]. 


Theorem 9.10. (Fundamental theorem of symmetric polynomials.) Let R be 
a commutative ring and let f € R|X1,..., Xn] be a symmetric polynomial. 
Then there is g E€ R[Xj,..., Xn] such that f(Xq,...,Xn) = g(01, 02,.--,On)- 


9.4. The theorem of symmetric polynomials 411 


Proof. We will use induction on n and inside the induction step an induction 
on deg(f). For n = 1 everything is clear, so assume the result holds for n — 1. 
We now prove by induction on deg(f) the assertion of the theorem with n 
variables. If deg(f) = 0 or 1, everything is clear. It is clear that the polyno- 
mial g(X,,...,Xn-1) = f(Xi,..-, Xn-1, 0) is still symmetric, so by (the first) 
induction it is a polynomial of the form h(X, +... + Xn-1,...,X1°°+Xn-1) 
for some h € R[X,...,Xn_-1]. Note that the difference 


f(X1,...,;Xn) -A(X +--+ + Xn,- Xa Xn t+ + XiXe Xna) 


vanishes when X, = 0 and is a symmetric polynomial. Therefore this poly- 
nomial is a multiple of X,---X,. Applying the inductive hypothesis to the 
quotient between this polynomial and X,---X, (which has degree less than 
deg f), the result follows. o 


Remark 9.11. It is not difficult to prove that the polynomial g is unique. 
This means that there are no algebraic relations between the polynomials 
01,02,-..-,O0n. 

Remark 9.12. The theorem also implies that any syminetric rational function 
fe R(X, X2,..., Xn) is a rational function in the o;’s. Indeed, let 


O: P(X, Xo,. . -s Xn) = P(Xo0) X25: . , Xoln)) 
for P € RİX, X2,..., Xn]. Then we can write 


P P; 


l= > These 0 @ 


for some polynomials P,Q, Pi. Since f is symmetric, so is P4. 

The result follows from the theorem of symmetric polynomials applied to 
P, and to [],o-Q. 
Remark 9.13. We refer the reader to [66], chapter 5 for the proof of the 
following theorem of Lagrange: let K be a field of characteristic 0. If 
f € K(X, X2,..., Xn), let Gy be the set (actually group) of those permu- 
tations o € Sn such that f(X1, Xo,.-..,Xn) = f(Xo 1), Xo(2))---s Xan). H 


412 Chapter 9. A Little Introduction to Algebraic Number Theory 


fig € K(X, X2,..., Xn) satisfy Gf C Gg, then one can find a rational func- 
tion h whose coefficients are symmetric polynomials in X1, X2,..., Xn such 
that g =h(f). 

A very important consequence of theorem 9.10 is the following result, that 
will be constantly used in this section. 


Corollary 9.14. a) Let f € Q[X1,Xo,...,Xn] be a symmetric polyno- 
mial and let g € Q|X] be a polynomial of degree n, with complex roots 
21, 22,--+,2n. Then f(z, 22,---,2n) E Q. 


b) If f has integer coefficients and if g is monic with integer coefficients, 
then f (21, Z2,-.-,2n) is an integer. 


Proof. Using theorem 9.10, we can write 
f(®%,...,Xn) = h(o1, 02,..-,on) 


for some h € Q[X),..., Xn] (resp Z[X1,..., Xn]). The result follows from 
the fact that 0;(z1, z2,.-., Zn) are rational (respectively integers), because the 
coefficients of g are so. LJ 


Another very useful result is the following generalization of Fermat’s little 
theorem. 


Corollary 9.15. Let f € Z[|X] be a monic polynomial with complex roots 
21, 22,---,2n (multiplicities counted) and let p be a prime number. Then 


2} +25 tee +22 = (z1 z2 +: + 2n)” (mod p). 


Proof. Corollary 9.14 implies that both sides are integers. Consider the quo- 
tient by p of the difference between the left-hand side and the right-hand side. 
Using the multinomial formula, it is easy to see that this quotient is a symmet- 
ric polynomial with integer coefficients in 21, z2,...,2n, thus the result follows 
from corollary 9.14. LJ 


Here is a nice application of the previous corollary. It was one of the 
difficult problems given in the Romanian IMO Team Selection Tests in 2004. 


9.4. The theorem of symmetric polynomials 413 


11. Let a,b,c be integers. Define the sequence (£n)n>0 by £o = 4, z1 = 0, 
T2 = 2c, £3 = 3b and @n43 = aXn_-1 + b£n + C£n+1. Prove that for any 
prime p and any positive integer m, the number 2pm is divisible by p. 


Calin Popescu, Romanian TST 2004 


Proof. Let r1,r2,r3,r4 be the roots of the characteristic polynomial of the 
recurrence relation, namely X4 — cX? — bX —a. The crucial point and by far 
the hardest step in the proof is to realize that 


In =T] +r +r3 +r} 


for all n. This is suggested by xo = 4 and by the fact that problem creators 
tend to try to be sneaky.! Proving the previous formula is immediate by 
induction, once we prove it for n = 0,1,2,3. For n = 0,1 this is trivial, for 
n = 2 follows from the identity 


Sor} = (Sor) -2X rir = 2c 


i<j 

and for n = 3 we can use the recursive relation (since it is easy to see that 
Yn =T] +r7 +r3 +r} together with y-ı = —b/a satisfies the same recursive 
relation as £n). With this closed formula for the general term of the sequence, 
we need to prove that )>r? ” is a multiple of p. Since XC ri = 0, the result 
follows from corollary 9.15 and by induction on m. 0O 


Let us consider now a few more or less direct applications of theorem 9.10 
and of corollary 9.14. 


12. a) Let P, R be polynomials with rational coefficients such that P Æ 0. 
Prove that there exists a non-zero polynomial Q € Q[X] such that 
P(X)|Q(R(X)) 

b) Let P, R be polynomials with integer coefficients and suppose that 


P is monic. Prove that there exists a monic polynomial Q € Z[X] 
such that P(X )|Q(R(X)) 


Iranian Olympiad 2006 


‘As Richard Stong kindly remarks. .. 


414 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proof. The idea is very natural: the first condition that should be satisfied in 
order to have P(X )|Q( R(X )) is that for each root z of P we have Q(R(z)) = 0. 
Therefore, if £1, £2, ..., £n are the roots of P (some of the x;’s may be equal), 
then we would like to have Q(/?(x;)) = 0. The most natural choice is to take 


Q(x) = ]] (x - Re). 
i=l 
Note that it satisfies P(X)|Q(R(X)), because X — 2; divides R(X) — R(z;) 
for all i. It remains to check that Q has rational (respectively integer, for the 
second part of the problem) coefficients. This follows from corollary 9.14. UO 


13. a) Let a),a2,...,@m,61,b2,-°-: , bn E C be such that 


m n 


AX) =[](X-ai), fX) = [| [(X -b:) € Z[xX]. 


i=] i=1 
Suppose that there exist g1, g2 € Z[X] such that fig: + fog = 1. 
Prove that: 


b) If a;i, bi are integers and 


Me-t) = 


prove that there exist polynomials g1, g2 € Z[X] such that fig; + 
fag = 1. 
Ibero-American Olympiad 


Proof. a) Note that the relation to be proved can also be written as 


[] fea) 








9.4. The theorem of symmetric polynomials 415 


Evaluating the relation f1ıgı + f292 = 1 at a; yields fo(a;)go(a;) = 1. Thus 


J [92(a:) 


On the other hand, [];", fo(a:) and ] [;4; g2(a;) are integers, by corollary 9.14. 
These two observations are enough to conclude. 

b) Note that |a; — b;| = 1 for all 7,7, because a;, b; are integers. It is then 
immediate that we have only two possible cases: 

1) A, B are singletons of the form {a}, {a+ 1} or {a+ 1}, {a}. 

2) A = {a} and B = {a — 1,a + 1} for some integer a or B = {a} and 
A= {a-1,a+l1}. 

Thus, by symmetry in A and B and by making a translation of the variable 
X — X — a, it is enough to consider the cases when A = {0}, B = {1} and 
A = {0}, B = {-1,1}. In each case fı divides some X” and f divides some 
(X2—1)™. Thus fı divides X?" and fy divides (X?° — 1)" for k,n sufficiently 
large. It is thus enough to find Bézout relations with integral coefficients for 
the polynomials X?" and (X?" — 1)”. But this is immediate. o 


m 


] | fe(as) 


i=1 


= 1. 














Remark 9.16. The assumption that a; and b; are integers is useless. Here is a 
proof, due to Richard Stong. We advise the reader not familiar with the notion 
of resultant to read the discussion before problem 27 in chapter 12. It is not 
too difficult to check that the resultant of fı and fo is []j", [[j_,(b;—-ai) = +1. 
But then the map 


P : Z[X Jacg( fo)—1 X Z[X laeg(y)-1 > ZIX deg f)+dGeg(f2)-1 
defined by y(g1, 92) = 91(X)fi(X) + go(X)fo(X) is invertible, thus we can 
find g1,g2 E€ Z[X] such that figi + foge = 1. 
A classical problem is to prove that 


1 
lat v2 > ap 


for any integers a,b, not both of them equal to 0. The idea is that it is not 
clear how to deal with |a + bv2]| directly, but it is very easy to say something 


416 Chapter 9. A Little Introduction to Algebraic Number Theory 


about the product of this number and its conjugate |a — bV2|. Indeed, this 
product is a nonzero integer, thus at least 1. The result follows immediately. 
With a similar idea, it is not difficult to prove the following absolutely classical 
theorem of Liouville: if z is an algebraic irrational number of degree d, then 


there exists c > 0 such that for all integers p and q we have |z — p >. 


q lal? 
That is, irrational algebraic numbers are badly approximable with rational 


numbers. A much deeper result, for which Roth won the Fields medal, is that 





we can improve the previous inequality to |z — 2| > | > oe ooh for alle > 0. The 


following problems use this trick of multiplying by conjugates and estimating 
the conjugates, but they are much more challenging than the very simple 
example discussed above. 


14. Let k,n be positive integers and let P(X) be a polynomial of degree n 
with all coefficients in the set {—1,0, 1}. Suppose that (X — 1)*|P(X) 
and that there exists a prime q such that -+ Ing S Inter) +): Prove that the 
primitive complex roots of unity of order q are roots of P. 


IMC 2001 


Proof. The problem looks rather complicated because of the strange inequality 
imposed on q. Let us forget first about that and consider the product of 
all values of P at the primitive gth roots of unity, TI’: P(z). This is an 
integer, by corollary 9.14. If it is not 0, then Tt: |P(z;)| > 1. However, by 
assumption there exists a polynomial Q (necessarily with integer coefficients) 
such that P(X) = (1 — X)*Q(X). Therefore 


Since 


9.4. The theorem of symmetric polynomials 417 


we have meta — zi) = q and so 


II P(x) =q" II Q(zi). 


But the same argument as before shows that Tz, Q(z;) is a nonzero integer, 
therefore e? |P(z;)| > që. This is however impossible, since by assumption 
we have |P(z;)| < n + 1, therefore 


q-1 
[PD] < (n+ 1) < g". 
i=1 
The previous arguments show that P vanishes at one of the primitive roots 
of unity of order q. But since the polynomial 1 + X +---+ X17! is irreducible 
over the rational numbers, if P vanishes at a primitive root of unity of order 
q, then it also vanishes at all the other roots. This ends the proof. LJ 


One needs some gymnastics if one wants to avoid the use of Galois theory 
for the following problem. 


15. Let p be a prime and let n1, n2,..., nk be integers. Define 


k 277; 
S= cos — |. 
2 


pai 
2 


Prove that either S = 0 or S > k (4) 
Holden Lee 


2in 
p 


Proof. Let z =e r . The crucial ingredient in the proof is the following: 


Lemma 9.17. The number 


p—1 
2 k 


N= I] `S (217 4 zs) 


l=1 \j=1 


is an integer. Moreover, N = 0 if and only if S = 0. 


418 Chapter 9. A Little Introduction to Algebraic Number Theory 


Let us admit this for a moment and see how we can finish the proof. 
Assume that S #0. Then |N| > 1. So 


= |S|- 7 S cos m m <|S|-k oa 


l=2 | j 








and the conclusion follows. 

Now, let us prove the lemma. As is well-known (and easily proved by 
induction) there are polynomials F; € Z[X] of degree j such that XI + XI = 
F;(X + X7'). Let F = i 1 Fn; Then N = Tz, F(z! +z7'). The lemma 
will be proved if we prove the following result: 


Lemma 9.18. The minimal polynomial of z + z7} is 


P=! 
2 


F(X) = [[(X - (2 +27) = Fea + Fees ++ 
l=1 


Proof. Note that 


poi 
2 
—1 
=X- [[(X -2X - 27) 
l=1 
o 1+X+e +X 
— — = . 
Thus f = Fp-1 +Fp-3 +-+- by definition of the polynomials F}. In particular, 
2 2 
f has integer coefficients and so N is an integer (by the fundamental theorem of 
symmetric polynomials). Moreover, f has degree vt and vanishes at z +27! 


But 
Qz):Q oy pal 
(Q(z): Q(z +27] 2 | 


so f must be the minimal polynomial of z + z7!. o 


[Q(z +27') : Q] = 


9.4. The theorem of symmetric polynomials 419 


Now, assume that N = 0. Thus, there exists 1 < l < pot such that 
F(z' + z7') = 0. By the lemma, F is a multiple of f and so F vanishes at 
z+z-!. But this means that S = 0, a contradiction. Thus, we have proved 
the crucial claim and the result follows. LJ 


The following result is certainly classical, but it is rather difficult to find an 
elementary proof in the literature. We follow one of the approaches proposed 
in the beautiful article [10]. 


16. Let aj,a2,...,@, be positive rational numbers and let k1, k2,...,kn be 


integers greater than 1. If ay! fı +ay! 4, an! fn is a rational number, 


then any term of the previous sum is also a rational number. 


Proof. It is clearly enough to prove the following result: let k > 1 and suppose 
that the positive rational numbers aj,...,@n,01,..., bn satisfy 


a; */b) +--+ an */bn E€ Q. 


Then ~/b; € Q for all i. 
Let 
A; = {roots of X* — akb;} = {wai */b; | 1 < j < k}, 


where w is a primitive root of order k of 1. Also, let 
n. 
S = ` ai </b; 
i=l 


and 
P(X) = II (S — X -r2 —---— ap). 
r2€A2,...,tnEAn 
By theorem 9.14 we have P € Q[X]. Note that P(a; /b;) = 0. Let d be the 
least positive divisor of k for which ‘/e¢ € Q (it exists, as {/b* € Q). If we 
manage to prove that d = 1, it will follow that */b; € Q, so we can delete the 
first term of S and conclude by induction on n. So, let us prove that d = 1. 


By definition, we can write a; Yb = Yz with x € Q4. The crucial fact is the 
following: 


420 Chapter 9. A Little Introduction to Algebraic Number Theory 


Lemma 9.19. X¢ — x is irreducible in Q|X]. 


Proof. If F is a monic polynomial with rational coefficients of degree between 
l and d — 1 that divides Xt — zx, all roots of F have absolute value #/z and so 


|F(0)| = (x/z)*8") is a rational number, that is ¥/ piel) € Q, contradicting 
the minimality of d. go 


Since P(¢/x) = 0, the previous lemma yields X¢ — z | P in Q[X]. Thus, 
if z is a primitive root of order d of 1, we have P(z 4z) = 0 and so there are 
(£2,..., En) E Ap X A3 X -+ X An with S — z Vk = t2 +--+ £n. If d > 2, then 
Re(z) < 1, so 

Re(S) = S 
= Re(zV7r+22+-+-+2n) 


< Re(z Vz) + X |ai 
i=2 

= Re(z Yx) + X ai Vb; 
i=2 


< Vr + > ai Vbi 
i=2 
= S, 
a contradiction. So d = 1 and a, ¥b; € Q. o 


9.5 Ideal theory and local methods 


We strongly advise the reader not familiar with algebraic number theory 
to read the appendices on number fields and p-adic numbers before reading 
this section, which is short but rather challenging. 

We start with a beautiful result of Polya concerning linear recurrence 
sequences. Recall that a sequence (an)n is called a linear recurrence sequence 
if one can find d and z),...,2q such that an+da + Z1ân+d-1 +: + Tdan = 0 
for all n. 


9.5. Ideal theory and local methods 421 


17. Suppose that (an)n>1 is a linear recurrence sequence of integers such 
that n divides a, for all positive integers n. Prove that (22) is also a 
linear recurrence sequence. 


Polya 


Proof. By the general theory of linear recurrence sequences, we can find dis- 
tinct nonzero algebraic numbers z1, z2,..., Zm and polynomials f1, fo,..., fm 
with algebraic coefficients such that 


an = fi(n)zi + foln)2g +: + fm(n)zm 


for all n. We will prove that if nla, for all n, then f;(0) = 0 for all ż¿, from 
which the result follows easily. 

Let K be the field obtained by adjoining to Q all z;’s and all coefficients 
of the polynomials f;. Choose a prime p which does not divide any of the 
norms of the (nonzero) coefficients of f;’s or the norms of one of the z;’s. All 
sufficiently large primes satisfy this property. Fix such a prime p and consider 
a prime ideal J of K over p, with norm N(J) = pf. Impose the condition that 
jp! divides a;5- Note that 


Ajnf = N fi(0)23 (mod I), 
i=l 


since zP = zi (mod J) and since p € I. Thus, we must have 
m . 
N fiO) =0 (mod J) 
i=0 
for all j = 0,1,...,m-—1. Seeing this as a linear system in the f;(0)’s, it 


follows that f;(0) € J for all i, unless J divides the determinant of the matrix 
associated to this system. However, this is a Vandermonde determinant in 
the z;’s and so, if we ensure that J and Liz; (zi — z;) are relatively prime, we 
will be able to conclude that f;(0) € J. But to ensure the last property, it is 
enough to choose a prime p which does not divide the norm of the algebraic 
number |];.;(zi — zj). Again, all sufficiently large primes have this property. 


422 Chapter 9. A Little Introduction to Algebraic Number Theory 


The previous paragraph shows that we can find infinitely many primes p 
and for each such prime an ideal J over p such that f;(0) € J for all i. But then 
p will divide the norm of f;(0) for infinitely many primes p and so f;(0) = 0 
for all z. It is then clear that “ is still a linearly recurrence sequence. E 


We present two approaches for the following challenging problem: a rather 
exotic elementary one and a more advanced approach which uses standard 
facts about number fields and their p-adic completions. 


18. Let a1, a@2,...,@, be complex numbers such that a? +a7 +: -- +a% is an 
integer for all positive integers m. Prove that (X — aı)\(X —a2)---(X — 
an) € ZX]. 


Michael Larsen, AMM E 2993 


Proof. Let 


Ok = ) Qi, Qia *** Qi, 


1<i) <iz< <i IN 


and Py = a¥ + ak + --- + ak, so Newton’s identities? can be written (for 
1<k<n) 
Pk — oi Pk-1 + 0oPp-2 — ++ + (—1)*kox, = Q. 


It follows immediately from these relations that if P, € Z for all k, then ok € 
LZ for alll < k < n. In particular, (X — aı)(X — a2)--- (X — an) E + Z[X] 
and so n!- a; are algebraic integers. Observe that if a1,a2,...,an satisfy the 
conditions of the problem, then so do aj,a5,...,a7, for all r > 1. We deduce 
that nla? is an algebraic integer for all r. The next lemma shows that all 
a;’s are algebraic integers, so the coefficients of (X — a ,)(X — ag)---(X — an) 
are algebraic integers. Since these coefficients are rational numbers (this has 
already been established), they must be integers and the result follows. 


Lemma 9.20. Let n be a positive integer and a be an algebraic number. If 
na! is an algebraic integer for all positive integers k, then a is an algebraic 


integer. 


?See the remark 9.22 for a proof of these. 


9.5. Ideal theory and local methods 423 


Proof. If d; is the degree of the algebraic number a’, it is clear that dəx > dor+1 
(because Q(a?""') C Q(a?")). Thus there exists an integer j and a positive 
integer d such that dọ = d for all k > 7. Let a, = a?” a,...,@q denote 
the conjugates of a?” and choose a positive integer c such that fo = c(X — 
a,)---(X — aa) € Z[X] is primitive. Then go = c(X + a1) -- (X + aa) € Z[X] 
is also primitive, so by an easy application of Gauss’ lemma fı = c?(X — 
a?)---(X — a2) € Z[X] is also primitive. Since a? has degree d and since 
deg fı = d, it follows that fı is irreducible over Q. Repeating this argument, we 
obtain that f, = c?” (X —a?")---(X —a? ) € Z[X] is primitive and irreducible. 
Next, since na?’ is an algebraic integer, we have h, = n?(X — a? )---(X — 
a2’) = (nX — na? )--- (nX — naĝ ) € Z[X], so we must have Ty € Z. Since 
this happens for all sufficiently large r, it follows that c = +1 and so a” is an 
algebraic integer. As a result, a is an algebraic integer and we’re done. E 


Proof. This proof uses rather heavy material, but it is much more conceptual 
than the previous one. Namely, we will use a local-global principle, stating 
that an algebraic number z is an algebraic integer if and only if v(x) > 0 for 
any valuation v on Q. This follows easily from the relations between a number 
field and its completions (see the addendum on number fields), but the result 
is not obvious at all. Anyway, once we have this, the lemma is immediate: if 
v is a valuation, then we know that v(n) + kv(a) > 0 for all k. Dividing by 
k and making k > œ yields v(a) > 0, which is enough to ensure that a is an 
algebraic integer. LJ 


The lemma is proven, and so we are done. E 


Remark 9.21. The case when all a;’s are rational numbers is much easier and 
is a rather folklore problem. In this case, the problem reduces easily to the 
following: if p is a prime number and if a1,a2,...,a, are integers such that 
p” divides af + aj +---+ a, for all n, then p divides all a;’s. This follows 
easily from Euler’s theorem, by choosing n = y(p™) with N sufficiently large. 
Actually, using ideal theory as in the previous problem and imitating the proof 
for rational numbers, one can give yet another solution of the problem. We 
leave this as a nice exercise for the reader. 


424 Chapter 9. A Little Introduction to Algebraic Number Theory 


Remark 9.22. Let us recall the proof of Newton’s relations. Let a; be elements 
of a field K of characteristic 0 and define 


f(X) =] [0 - aX) = eax 
1=1 


Let Py = af + ak +---+ a. Observe that 


F(X) os E 
FX) =~ y =- P X4.. 1 


i=1 k>1 





Identifying coefficients in the equality 


F(X) = -f(X)-) PX® 


k>]1 
yields Newton's relations 
mbm + Pibm-1 + 1. + Panbo = 0 


forl<m<n. 


The following is also a very tricky problem. We use a p-adic approach 
to solve it and we refer the reader to the appendices on p-adic numbers and 
number fields for more details. 


19. Let p,q be prime numbers and let r be a positive integer such that q|p—1, 


q does not divide r and p > r?—! Let aj, a9,....a, be integers such that 
po! po} pa} 
a? +a? +:-::+a,r" isa multiple of p. Prove that at least one of 


the a;’s is a multiple of p. 
J. Borosh, D.A. Hensley, J. Zinn, AMM 10748 


Proof. Let z = ea and let K = Q(z). This is an extension of degree q — 1 of 
Q. By choosing a prime dividing p in the ring of algebraic integers of K and 
completing K with respect to this prime, we obtain an extension of the p-adic 
valuation vp on K. Moreover, if x is an algebraic integer in K, then v(x) > 0. 


9.5. Ideal theory and local methods 425 


Assume that no a; is a multiple of p and let z; = z*~! (1 <i < q). Since 


q p-1 i 
Up(a;7 — zj) = vpla{ — 1) > 0, 


j=l 


p-1 
there exists o(i) € {1,2,...,q} such that u,(a;% — zgq)) > 0. Since we 
p-1 


have vp (z. a; ° ) > 0, it follows that vp (> Za(i)) > 0. So, if f(X) = 


Sv, X71, we have vp(f(z)) > 0. Since f(z;) is an algebraic integer for all 
i, it follows that vp(f(1) L [s f(z)) > 0. Let N = ] [}—» f(x). N is an integer, 
because it is a symmetric polynomial expression with integer coefficients in 
the roots of the polynomial X197! + --- +X +1. We claim that N is nonzero. 
Otherwise, there exists 2 < i < q such that f(z;) = 0. The irreducibility of 
the polynomial 1 + X +---+ X17! over the rational numbers implies that f 
is a multiple of 1 + X +---+ X17}. But then r = f(1) is a multiple of q, a 
contradiction with the hypothesis. 

So N is a nonzero integer and v,(rN) > 0, so that vp(N) > 0 (clearly p 
does not divide r). Thus |N| > p. On the other hand, we have | f(z;)| < r, so 
that |N| < r17! < p. This contradiction finishes the proof. O 


Proof. Here is a more elementary, but still very tricky solution, based on 
the theorem of symmetric polynomials. Suppose that none of the a;’s is a 


multiple of p and let h = T, where g is a primitive root mod p. We 
—1 

can therefore find positive integers m; such that a;° = h™ (mod p). Let 

f = X™ + X™ +... + X™ and let g € Z[X] satisfy 

F(X) F(X) ++ F (Xa-1) = gloi (X1, X25- -< , Xq-1) +++ 0r(X1 X2 Xq-1)), 


where o; are the symmetric fundamental sums. Let 2, 22,...,2g-1 be the 


complex roots of a Note that 





= (X —h)(X—h*)- + (X — ht") € F,[X], 





426 Chapter 9. A Little Introduction to Algebraic Number Theory 


as h, h?,..., h17} are distinct qth roots of unity in Fp. This implies that 
Oi (21, 22,---, 2-1) = ai(h, h?,... AI?) (mod p) 


for all i. Therefore 
q—1 
f(zi) = g(oi(h, h?,...,R97"),...,0q-1(h, h?,...,h7")) 
1=1 
q—l1 
= | [| (1) =0 (mod p), 


i=1 


that is p divides the integer N = f(z1)----- f(%g-1). As in the previous 
solution we obtain |N| < rıl < p and so N = 0. We conclude as in the 
previous solution. LJ 


9.6 Miscellaneous problems 


It is really not easy to solve the following problem without the use of 
minimal polynomials. However, once the yoga of minimal polynomials is un- 
derstood, the argument is rather standard. 


20. Find the least positive integer n such that cos cannot be written in 
the form p+ y/q + Wr with p,q,r € Q. 
O. Mushkarov, N. Nikolov, Bulgaria 


Proof. For n < 6, explicit computations show that cos > can be written in the 


desired form (the argument is a bit tricky for n = 5, but note that z = es jsa 
solution of the equation 24 — 23 + z? — z + 1 = 0, which can also be written as 
(z +27!)? — (z +2z7!)—1 = 0.) The question is whether we can write cos in 
the form p+ y/q + Wr with p,q,r € Q and the answer turns out to be negative, 
implying that the answer to the problem is n = 7. 

Let us assume that 


T 
cos =p+yq+ vr 


9.6. Miscellaneous problems 427 


and first compute the minimal polynomial of cos 7. In order to do this, we 


will first find a rational equation of low degree satisfied by cos 7. Let z = eT, 
so that z” = —1 and 


z 74 x4 23 +z -z+1=0. 


Dividing this by z% and rearranging terms yields 
1 1 1 

z2 += (2+5) +z+--1=0. 
z z z 


wT 


. z+1 . . . 
Thus, if z = cos > = —5*, then the previous relation gives 


82° — 6x — (4x7 — 2) + 22 — 1 = 0, 


that is 82° — 4r? — 4x + 1 = 0. Since the polynomial 


1 1 1 
X)= X?-=xX*--x4- 
f(X) 5 3% t3 
is trivially irreducible over the rational numbers (it has degree 3, so we only 
have to look for rational roots), this is the minimal polynomial of x. Therefore 
x and x — p have degree 3 over Q. 
But then (observe that the identity ((,/¢+ W/r) — vq)? = r easily implies 


that vq E€ Q(./¢+ Wr)) 
[Q(va + Vr): Q(Ya)] - Cva) : Q] = [Q( Ya + Vr): Q] =3 


and since [Q(,/q) : Q] is 1 or 2, it follows that \/q is a rational number. Thus 
z—p—/q= Wr and u = p+ „4q is a rational number. Now, since z is 
irrational, we must have ẹ/r irrational and so X? — r is irreducible over the 
rational numbers. Since f(u + %/r) = f(x) =0, it follows that X° — r divides 
f(u + X) and so (for degree reasons) we must have f(u + X) = X3 —r. It 
is trivial now, by identifying coefficients, to see that this is not possible. The 
result follows. E 


428 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proof. As in the previous solution it is enough to show that we cannot have 
n 3 
cos > =p + Jat Wr. 


As before we compute that cos > satisfies 8r? — 4r? — 4x + 1 and that this 
polynomial is irreducible since it ‘has no rational roots. Also either by noting 
that the other two roots are cos sr and cos a5 or by plugging in a few values, 
we see that this polynomial has three real roots. 


Now let z = e27'/3 and suppose z = p+ vq + zf 3/r. Then 


= (z — p F Va)’ = (x — p)* + 3q(2 — p) F (B(x — p)* +q) V4. 


So 
(x — p)? + 3q(x — p) — r] = (3(x — p)? + q)°4q. 
Thus 


g(X) = [(X — p)? + 3q(X — p) — r}? — (3(X — p)? + q)°q € Q[X] 


is a sixth degree polynomial with roots p+ yq + zF 3/r. If the equality above 
holds, then this polynomial must be a multiple of f(X), the minimal polyno- 
mial of cos? = p+ yq + yr. However f(X) has three real roots and g(X) 
has only two real roots (the ones with k = 0). Thus this cannot occur. O 


We continue with a very beautiful problem and a very elegant solution. 


21. Let s1,s2,... and t1, t2,... be two infinite nonconstant sequences of ra- 
tional numbers such that (s; — s;)(t; — t;) is an integer for all i,j > 1. 
Prove that there exists a rational number r such that (s; —s;)r and mitj 
are integers for all 2, 7. 


USAMO 2009 


Proof. We start with some useful reductions: first of all, by working with the 
sequences (s; — s1); and (t; — tı );, we may assume that sı = tı = 0. Secondly, 


there is u such that s, Æ 0 and, by working with the sequences (=) and 
u n 


(Su: tn)n, we may assume that s, = 1. 


9.6. Miscellaneous problems 429 


Now, by assumption Sntn is an integer for all n. But then 
Siti + sity = Siti + sjtj — (si — S;) (ti — t;) 


is also an integer for all i, j. Since sjt; + sjt; and (s;t;) - (sjti) = (siti)(s;t;) 
are integers, s;t; and sjt; are algebraic integers. Since they are also rational 
numbers, they must be rational integers. Thus s;¢; is an integer for all 7,7. 
For i = u, we obtain that all t; are integers. Let d be their greatest common 
divisor. Then clearly 4 is an integer for all i. We claim that ds; is also 
an integer for all 2, which will solve the problem. But since d is a linear 
combination with integer coefficients of some t;’s (by Bézout’s lemma) and 
since sit; € Z for all 2,7, it is clear that ds; € Z for all 2. The conclusion 
follows. O 


In order to motivate the next problem, we will discuss first a very classical 
and nontrivial result in elementary number theory. The reader is advised to 
read the addendum 9.A before reading the proof. 


Theorem 9.23. (Lucas-Lehmer) Define a sequence by ao = 4 and an41 = 
a? — 2 forn>0. Let m be an odd positive integer and let n = 2™ —1. Then 
n is a prime if and only if n|am-2. 


Proof. The first difficulty is to actually find a manageable formula for the 
general term of the sequence. We use the identity 2? + 27? = (x + r7!) — 2 
and set an = Tn + T, l for a sequence £n > 1 (note that an > 2, so £p exists). 
Then n41 = r2, SO Tn = x2" and we easily conclude that 


an = (2+ V3)" + (2 — v3)”. 


Suppose that n = p is a prime and m > 3. Since p = 1 (mod 3) and 
= —] (mod 8), the quadratic reciprocity law implies that (2) = 1 and 


p 
(2) = —1. Pick some a in an algebraic closure? of F, such that a2 = 3. 


’ 3One does not need the existence of an algebraic closure to prove the existence of a: if 3 
is a quadratic residue mod p, it is clear what we have to do; otherwise, it is easy to check 
that F,[X]/(X? — 3) is a field with p? elements and we can take for a the image of X in this 
field. 


430 Chapter 9. A Little Introduction to Algebraic Number Theory 


Note that a is actually an element of F,2 and that we can define a map 
f : 2Z[V3) > Fi2 by fla + b\/3) = ā + ba, where ā = a (mod p) (seen as an 
element of F,2). Since a? = 3, it is immediate to check that this is a ring 
homomorphism. Trivially, f vanishes on pZ. Let x = f(2+ V3) =2+a 
and y = f(2— v3) = 2-a. Thus z,y € F,2 and they are nonzero, since 
cy = f(1) = 1. We want to prove that f(am-2) = 0 or equivalently that 
r2”? 2™™ = O, i.e. a'r = —1. Since 2z = (1 + a)?, we obtain the 
following equality in F,: 


2 
2 T =2 (=) -£F = (21) =(1+a)?*! = (1 + a)(1 + aP). 
p 


Since a? = 3, we have a? = (2) -œ = —a, which combined with the previous 


equality yields the desired result. 
Let us prove the converse now. Suppose that n|am-2, we need that n is a 
prime. It is enough to check that for all pln we have p > yn. Since p divides 


am-?2, the previous arguments yield the equality (2 +a)" = —lin F52. Thus 
2+Qa€ F52 has order n + 1 and Lagrange’s theorem yields n + 1|p? — 1. The 
result follows. O 

22. The sequence ao, a1, a2, ... is defined by ag = 2 and ak41 = 2a? — 1 for 


k > 0. Prove that if an odd prime p divides a, then 2”*3 divides p? — 1. 
IMO Shortlist 2003 


Proof. Note that 2a, is precisely the sequence studied in theorem 9.23, so 


a - CVB” +2 = v3)" 
n 9 — 


Let now p > 2 be a prime factor of a, and let a € F, be such that a? = 3. 
Define f,x,y as in the proof of the previous theorem. Since plan, we have 
r?” +y* = 0, thus r?”™ = —1. Hence z has order 2”+? in the group F52 
and so by Lagrange’s theorem 2”+2 divides p? — 1. Unfortunately, this is not 
enough, but we are close. 


9.6. Miscellaneous problems 431 


If x € F}, everything is easy, since then Lagrange’s theorem for this 
subgroup yields 2”+?|p—1 and so trivially 2"*3|p* —1. So, assume that z is not 
in F*. Then z, y are roots of the irreducible polynomial X? — 4X +1 € F,[X], 
so that we must have x? = y. Indeed, since z? — 4r + 1 = 0, we also have 
(by raising the previous equality to the p-th power and by using the formula 
(x + y)? = x? + yP, valid in fields of characteristic p) £? — 42? + 1 = 0, so 
that x? is also a root of X? — 4X +1, which cannot be x (because otherwise 
x? = z and z € F*). Thus z? = y and so z?t! = 1. But then 2"*%, which is 
the order of z, must divide p+ 1 and we are done again. LJ 


There is really no obvious approach to the following rather exotic problem. 


23. Let k be a positive integer and let aj,a2,...,a; and b1,b9,...,5,; be 
two sequences of rational numbers with the property: for any irrational 
numbers 21,72,...,Z2; > 1 there exist positive integers N1, N2,..., Nk 
and M1, M2,..., Mg such that 


ay [xp] + az[z3?] +--+ + ag [ee] = bi [xy] + balez?) +--+ + bleg]. 
Prove that a; = b; for all 7. 


Gabriel Dospinescu, Mathlinks Contest 


Proof. The key point is the following result: 


Lemma 9.24. For any integer N > 2 we can find irrational numbers a,b > 1 
such that for every positive integer m we have [a™] = —1 (mod N) and [b™] = 
0 (mod N). 


Proof. We will choose a,b to be algebraic integers of degree 2. Let us show 
how to construct a and leave to the reader the details for the construction of 
b. We want to find a polynomial with integer coefficients 


f(X) =X? 4+uX += (X -a)(X -0c) 


for some irrational numbers a > 1,0 < c < 1. In this case, since a” +c™ is an 
integer for all positive integers m, it follows that [a”] = a™ + c™ — 1 for all 
m. Thus, we need to ensure that a™ + c™ =0 (mod N) for all m. Since 


a™t! 4 cmt — —u(a™ 4 c™) — v(a™! + ¢™!) 


432 Chapter 9. A Little Introduction to Algebraic Number Theory 


for all m, it is enough to ensure that N divides u,v. Also, to ensure that 
0O<c<_ 1 we will choose v > 0 and 1 +u +v < 0. For instance, we can take 
u=—2N,v=N, yielding a = N + VN? - N. 

Similarly for b, we will choose u = —(2N + 1) and v = N, so 


_2N+14+V4N? +1 


b O 
2 
Coming back to the proof, choose a positive integer N and a,b irrational 
numbers as in the lemma. Set x; = a and 2 = --:- = £k = b. By hypothesis, 
we can find positive integers n1, n2,..., Nk and M1, M2,..., Mk such that 


ailer] + az[z3?] +--+ + arle] = bilet] + boles ?] +- + dg[xy*]. 


By the properties of a and b we deduce that a; = bı (mod N). Since N was 
arbitrary, it follows that aj = bı. Since we can do the same with the other 
pairs (a;i, bi), the result follows. oO 


The following result is really a mathematical gem, taken from [5]. It is 
quite difficult and has a very elementary proof. 


24. Prove that if p1,po,...,Dn are distinct primes and if 


ai ypı + azy p2 +: +anypnrn = 0 


for some rational numbers aj, a2,...,an, then a; = 0 for all 7. 


Besicovitch’s theorem 


Proof. We will prove by induction on n the following statement: for any m > 1 
and any distinct primes q1, q2, ... , qm, P1, P2, ---, Pn we have? 


V9192 `- Im E Q(VPL, VP2,- - -, VPn). 


‘If K/F is an extension of fields and if z1, £2,...,£n € K, we let F(£1,£2,...,£n) be the 
smallest subfield of K which contains F and 2).22,...,2n. It is also the set of elements of 
the form f(xr1,22,...,2n), where f is a rational function in n variables with coefficients in 


F. 


9.7. Notes 433 


Let us prove the base case: assume that n = 1 and that 


Vq -tqm = at by/p1 


for some rational numbers a,b. Squaring this relation and using that ,/p; is 
irrational, we deduce that ab = 0. But then either qiq2---qm Or q1G2°-:QmP1 
is a perfect square, which is clearly not possible. Now, assume that the result 
holds for n and let us prove it for n+ 1. Let F = Q(,/p1, /po,.--, Pn) and 


assume that ,/q1q2 `` qm = a + b,/Pn+1 for some a,b € F. Again, we square 
this relation to deduce that 


2ab\/Pnti = Q102 `` Gm — 0° — Ppn41b° € F. 


However, by the inductive hypothesis we have ,/pn41 ¢ F, so we must have 
ab = 0. If a = 0, we obtain that \/pn4iqgiq2---dm €E F, contradicting the 
inductive hypothesis. If b = 0, we get a similar contradiction. In all cases, the 
inductive step is proved and the conclusion follows. LJ 


Remark 9.25. In [58], Mordell proved the following generalization: 


Theorem 9.26. Let K C L be fields of characteristic 0 and let 41, £2,...,£r be 
elements of L such that for alli there exists a least positive integer ni such that 
x. € K. Suppose that for all integers e1,€2,...,er, if TP +25? +++: re EK, 
then n; divides e; for alli. Finally, suppose that L C R or that K contains 
all nith roots of unity, for alli. Then (xi) 2 Li Jo<i,<n; 18 a linearly 
independent set. In particular, |K(x1,22,...,2%7): K] = ny +--+ + Np. 


9.7 Notes 


We thank the following people for providing solutions: Amol Aggarwal 
(problem 18), Darij Grinberg (problem 1), Daniel Harrer (problem 9), Holden 
Lee (problems 8, 10), Thanasin Nampaisarn (problem 13), Fedja Nazarov 
(problem 3), Richard Stong (problems 4, 6, 20), Qiaochu Yuan (problems 4, 
17), Victor Wang (problems 9, 19), Gjergji Zaimi (problems 2, 3). 


Addendum 9.A Equations over 
Finite Fields 


This addendum is a modest introduction to finite fields and polynomial 
equations over finite fields. ‘There are some very beautiful and extremely deep 
results on the subject, which are far beyond the scope of this book. But the 
fact that their proofs are very difficult should not be a reason for not presenting 
them. We highly recommend the introductory text [43] for more details. 

To avoid spending too much time on preliminaries, we will fix a prime 
number p and an algebraic closure F, of the field F, = Z/pZ. Recall that 
this means that any zx € F, is a root of some nonzero polynomial f € F,[X] 
and that any f € F,[X] has at least one root in F, (which actually implies 
that it splits into linear factors over Fp). It is a rather nontrivial theorem of 
Steinitz that any field has an algebraic closure and any two algebraic closures 
are isomorphic. We take this approach when introducing finite fields since it 
is pretty rapid, though not very elegant... 

Before proving the first fundamental result, let us glorify the following 
easy result, which will be constantly used in this chapter: 


Proposition 9.A.1. Let p be a prime and let A be a ring such that? pa = 0 
for alla E€ A. Then for all powers q of p and for alla;,ao,...,an E€ A we have 


(ai tag +: +an) =a) +a3+---+ 4. 
1 2 


Proof. By induction on n, we may assume that n = 2. Then everything follows 
from the usual binomial formula, the hypothesis on A and the fact that (3) =0 
(mod p) for any 1 <i <q. 


If q is a power of p, let 
F, = {x € F, |x? = zx}. 
We have the following easy, but crucial result: 


Theorem 9.A.2. F, is the unique field with q elements contained in F,. 


>We say that A has characteristic p. 


9.A. Equations over Finite Fields 435 


Proof. First, let us check that F; is a field. It is clearly stable by multiplication 
and stability under addition follows from the previous proposition. F, has q 
elements since X1 — X splits into linear factors over F, (because F, is alge- 
braically closed) and all of these linear factors are distinct (because X1 — X 
is prime to its derivative —1). 

Let us consider now a subfield L of F, with q elements. As L* is a group 
with q — 1 elements, Lagrange’s theorem yields z197! = 1 for all x € L*. Thus 
x1 = x for all z € L and so L C Fy. A cardinality argument finishes the 
proof. E 


A more subtle result is the following generalization of Gauss’ classical 
theorem on primitive roots modulo prime numbers. 


Theorem 9.A.3. F% ts a cyclic group of order q — 1. More generally, if K is 
any field and G is a finite subgroup of K*, then G is cyclic. 


Proof. Let d be the maximal order of the elements of G. It is a general property 
of finite abelian groups that if x,y € G have orders m,n, then one can find 
z € G with order lcm(m,n) (the reader can take this as an easy exercise). 
Using this, we deduce that the order of any element of G divides d. Thus for 
all g € G we have g? = 1. But the polynomial Xt — 1 € K[X] vanishes at all 
elements of G, so d > |G|. On the other hand, d is the order of some element 
of G, so d||G| by Lagrange’s theorem. Therefore d = |G| and G is cyclic. O 


There is a trap concerning finite fields: it is not true that if n > m, then 
Fpm C Fn. Actually, this inclusion takes place if and only if X p—! _ 1 divides 
XP"-! _ 1 (this follows immediately from the definition and the fact that the 
roots of X41 — X are simple) and this happens if and only if p™ — 1 divides 
p” — 1, which in turns happens if and only if m divides n. 

A fundamental object in the theory of finite fields is the Frobenius map 


Fro: Fgn > Fon, = Frg(z) = 27, 


an automorphism of Fyn which acts as identity on Fz. Moreover, any such au- 
tomorphism is an iterate of the Frobenius map and there are precisely n such 


436 Chapter 9. A Little Introduction to Algebraic Number Theory 


automorphisms.’ All these results would be pretty hard to prove without the- 
orem 9.A.3, but they become easy exercises once we have it. The following 
result is fundamental. It says that if you know a root of an irreducible poly- 
nomial over F4, then the other roots are obtained by successively applying the 
Frobenius map to that root. 


Theorem 9.A.4. Let f € F [X] be a monic irreducible polynomial of degree 
n and let x € F, be a root of f. Then the roots of f are v,r9,29 ,...,27 
In other words, f(X) = JE} (X — 2%). 


Proof. The key point is that 2? = x. Indeed, the field generated by x over 

F, (inside F,,) has q” elements, because z has degree n over Fy, so this field is 

Fon. But in Fg» all elements are roots of X 4" — X. Having done this, define 
n—1 


the polynomial G(X) = []j25 (X — ri). The key point and proposition 9.A.1 
yield 


n—1 n—1 
G(X)? = [](X?- 29") = [] (xX? - 2%) = G(x). 
i=0 i=0 
Thus, if we write G(X) = g + 91X +: + gıX!, then again by proposition 
9.A.1 
gh + 99X49 +--+ +g] XT = got aX HX", 
which implies that g? = g; for all i and so g; € F}. Thus G € F,[X]. Since 
G vanishes at x and f is irreducible, we deduce that f divides G. A degree 
argument finishes the proof. E 


9.A.1 Norm and trace maps 


Consider a finite field Fy and a finite extension Fyn. Define the norm and 
trace maps by 


n—-1 n—-] 

. q . 5 q! 

NE n /Fq : Fgn => For, T |> pi T? , Trp in JF; ° Fgn — For, T |> T? . 
j=0 j=0 


°In fancy terms, the Galois group of the extension F,» /F, is cyclic of order n and gener- 
ated by Fr4. 


9.A. Equations over Finite Fields 437 


The following result summarizes the basic properties of these maps, that will 
be used in future sections. 


Proposition 9.A.5. The norm and trace maps are surjective maps from Fan 
to Fy. The norm map is multiplicative and the trace map is additive. 


Proof. To avoid complicated notations, write N and T for the norm, respec- 
tively trace map. First, let us check that N(x), T(x) € F, for all x € Fon. 
It is enough to see that N(x)? = N(x) and T(z)? = T(z). For N(x), this is 
clear since 2? = z, while for T(z), this follows from proposition 9.A.1 and 
the equality 2? = z. It is clear that N is multiplicative and proposition 9.A.1 
shows that T is additive. It remains to prove the surjectivity of these maps. 


Let € be a generator of F} and let u be a generator of Fj. There exists 
a € Z such that € = u®. As £97! = 1, we have u2(9-)) = 1 and so there is 
an integer b such that a = b- T. But then € = N (u?) and the surjectivity 
of N follows. For the trace map, this argument does not work, however we 
note that T(az) = aT (x) for any a € Fy and any z € Fyn. Thus, it is enough 
to prove that there exists x such that T(x) # 0. But this is clear, as the 
polynomial X + X?7+4.---+ X q”~" has at most gq”! roots and so it cannot 
vanish on all of Fax. E 


9.A.2 Characters of finite fields 


As F,n is an n-dimensional vector space over F,, the choice of a basis 
yields a group isomorphism Fn œ F, x --- x Fp. Now, basic properties of the 
dual of a group discussed in section 7.A.1 yield the following result. 


Proposition 9.A.6. Let q be a power ofp. There is an isomorphism of groups 
a— Wa between F, and its character group, where 


210 


Pala) = eP Tear, 





Also, Fj is a cyclic group, so its group of characters is also cyclic of order 
q—1. The following result will play an important role in the following sections, 
when we will compute the zeta function of a diagonal hypersurface. 


438 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proposition 9.A.7. Let d be a divisor of q—1. The map x > xn, where 
Xn(Z) = X(NF n sp, (£)) induces a bijection between characters of order d of Fj 
and characters of order d of Fon. | 


Proof. The fact that xn is a character of order dividing d is a consequence of 
the multiplicativity of the norm map. The fact that it has order exactly d and 
that x > Xn is injective is a consequence of the surjectivity of the norm map 
(proposition 9.A.5). It remains to check the surjectivity of x > xn. Let x be 


nL] 


a character of order d of Fon and let u be a generator of Fj,. Then € = u T 
is a generator of F} and since y(u)! = 1 (because f = 1 and dļq — 1), there 
is a unique character x of Fẹ such that x(€) = x(u). By construction, X = Xn 
and the result follows. o 


Just as for Dirichlet characters, it is convenient to extend the definition of 
a multiplicative character x of Fj to Fy, by defining x(0) = 0 if x is nontrivial 
and x(0) = 1 if x is trivial. The following innocent-looking identity will play 
a crucial role in future arguments and is constantly used when dealing with 
equations over finite fields: 


Proposition 9.A.8. Let d be a divisor of q — 1 and let x € Fy. The num- 
ber of solutions of the equation yt = x with y € F,, denoted N (yt = z) 
1S ye yd=1 x(x), the sum being taken over all multiplicative characters whose 
order divides d. 


Proof. If x = 0, this is clear, as both sides are equal to 1. Assume that x Æ 0. 
If the equation yf = z has a solution in F,, then it has exactly d such solutions, 
as the equation yt = 1 has precisely d solutions in F} (because d\q—1 and Fy 
is cyclic of order q — 1). On the other hand, the dual group of Fj is also cyclic 
of order q — 1, so the equation x? = 1 has d solutions and for each of them 
x(x) = x(y*) = x(y)? = 1, so both sides of the equality we want to prove 
are equal to d and we are done. Finally, if the equation has no solution, the 
result is a consequence of the orthogonality relations (theorem 7.A.5) for the 
abelian group F% / {x4|x € F}, whose dual group is precisely the subgroup of 
those multiplicative characters x such that x = 1 (actually, this argument 
also covers the previous case... ). E 


9.A. Equations over Finite Fields 439 


Finally, the following result will be used in the proof of the Davenport- 
Hasse relation, to which the next section is devoted. It is by no means specific 
to finite fields, but the short proof we are going to give uses properties of finite 
fields developed in the previous sections. 


Proposition 9.A.9. Let x € Fgn and let 

f =X? -a XT! +--+ + (-1)%aqg € Fy[X] 
be its minimal polynomial over F,. Then d\n and M(X — r”) = fa. In 
particular, Np n /F, (©) = ag and Tren JF, (©) = þa. 
Proof. Since [F; (x) : Fy] = deg( f) = d, we have Fy(x) = F,a (this uses theorem 
9.A.2). But then F,a C Fgn and, as we have already remarked, this implies 


that d|n. Next, for degree reasons it is enough to prove that g = IZo (X —2%) 
has only one irreducible monic factor, namely f. But if h is such a factor, then 
h has some root x? . But proposition 9.A.4 implies that f also vanishes at r” 

(note that the cited proposition applies only for j < d, but we have rf =r 
anyway, since we have seen that F(x) = Fa). Thus gcd(f, h) is nonconstant 
and by irreducibility f = h. This finishes the proof. LJ 


9.A.3 Gauss and Jacobi sums, the Davenport-Hasse relation 


Gauss and Jacobi sums play a fundamental role in the theory of equations 
over finite fields and in number theory, in general. We give here their basic 
properties, that we will need in the following sections. But before doing that, 
it is convenient to define them... 


Definition 9.A.10. 1) If w and x are characters of Fy, respectively F9, 
the associated Gauss sum is 


v)= D> x(x)W(2 


rE? 


2) If x1 and x2 are characters of Fj, the associated Jacobi sum is 


J(X1,X2) = > X1(x 


r,yeF,,c2+y=1 


440 Chapter 9. A Little Introduction to Algebraic Number Theory 


Theorem 9.A.11. [fy and% are nontrivial, then |g(x, v)| = v4. 


Proof. The orthogonality relations (theorem 7.A.5) yield (using also the sub- 


stitution 7 = ) 


lx WP = X x(z/yb(e-y) = X xtut- 1) 


z,yeF* t,yeks 
=Y x(t) | So valt-)-1] = So xea ear - 
teF* yeF, teF3 
=q-1- X xt)=4-)_ xt) =4. 
t#0,1 teF? 


Corollary 9.A.12. If x and% are nontrivial, then 


g(x V) gxt, V) = x(-1)q. 


Proof. This is just a long string of obvious computations, using the previous 
theorem and the fact that g(x, Y(—-)) = x(—1)g(x, Y) (which is immediate by 
definition and the fact that x > —gz is a permutation of Fj). More precisely, 
we have 











E 


One has the following beautiful result which connects Gauss and Jacobi 
sums. Note the striking similarity with Euler’s famous formula 


r(x) (y) 


B(x, y) = T(x +y)’ 


9.A. Equations over Finite Fields 441 


where i 
OO 
I(x) = | e~t ldt, B(z,y) = | t7711 — t)» tdt, 
0 0 
the integrals being convergent for Re(z), Re(y) > 0. 


Theorem 9.A.13. If x1, x2 are nontrivial characters of Fy such that x1 - x2 
is nontrivial, then for all nontrivial characters p of Fg we have 


9(x1,¥) 9x2.) 


70x2) = 9(x1xX2) Y) 


Proof. This is a rather tricky computation: 


Txa,x2)90ax2¥) = So X xa(2)xily)x2(1 - 2)x2(y)d(y). 


reEF,—{0,1} yes 


Using the substitution a = ry and b = y(1 — z), this becomes 


X.  -x1(a)x2(b)b(a +b) = 90a, ¥)9(x2,¥) — X xa (a 


a,bEF* ,a+b40 ack? 


As x1x2 is nontrivial, the orthogonality relations (theorem 7.A.5) yield the 
desired result. LJ 


Here is a striking application. Assume that p = 1 (mod 4) is a prime. As 
F; is cyclic of order p — 1, there exists a nontrivial character xı of order 4 of 
F5. Let x2(x) = (2) be Legendre’s symbol. The previous two theorems imply 
that |J(x1,Xx2)|2 = p. On the other hand, it is clear that xı takes only the 
values 0, +1, +i, thus J(x1, x2) € Zli]. In particular, |J(x1, x2)|? is the sum of 
the squares of two integers. We recovered thus Fermat’s celebrated theorem 
that any prime of the form 4k + 1 is the sum of the squares of two integers. 

We end this section with a much deeper result, the famous Davenport- 
Hasse relation. This is a quite strong identity between Gauss sums, which is 
crucial for the proof of Weil’s theorem 9.A.24 that we will see a bit later on. 
It also has relations to the Langlands program, but that is really beyond the 
scope of this book. The proof is very ingenious. 


442 Chapter 9. A Little Introduction to Algebraic Number Theory 


Theorem 9.A.14. (Davenport-Hasse) Let x and w be nontrivial characters 
of Fy, respectively Fg. Then —gn(x,¥) = (—-9(x,¥))", where 


nV) = X x(Ne n/r, ())Y(Tre,n /F, (2): 


reF, n 


Proof. As NE n /F, (x) and TYF n /Fq (x) only depend on the minimal polynomial 
of x and not on its roots, we will partition Fyn into collections of conjugate 
elements over F4. Define 


A(X? — bi XT! +--+ (=1)%ba) = x(ba) (01) 


for any b; € Fy. It is easy to check that A(fg) = A(f) - A(g) for all monic 
polynomials f,g € F,[X]. Combining this with proposition 9.A.9 shows that 
if x € Fyn has minimal polynomial P = X — aX! 4.. (—1)faq over Fy, 
then 


X (Neu /,(2)} Y (Ter a/m, (2)) = (aa) 4% (Za) = \(P)/4, 


Summing over all conjugates of x and then over collections of conjugate ele- 
ments in Fgn yields the crucial identity 


mlx) = >) MPY“, 


d=deg P|n 


the sum being taken over all irreducible monic polynomials P € F,[X] whose 
degree divides n. 
To exploit this relation, consider the L-function 


L(T) = isa = DA pT.. 
P 


Here the product is taken over all P € F,[|X] monic irreducible, while the 
sum is over all f € F,[X] monic. The second equality follows from multiplica- 
tivity of À and the unique factorization theorem in F,|X]. Taking log and 


9.A. Equations over Finite Fields 443 


differentiating, we obtain 





L'(T) d 
T. LT =T- log L(T) 
= X X deg(P)A( P)” T” 8") 
P n>l 


=Ņ | X apy?) 7. 
n \d=deg(P)|n 


Combining this with the key relation of the previous paragraph, we conclude 


that 
aa = > mT 


Finally, we will show that L(T) has a very simple expression. Note that 


T)=1+¢ | X AQT. 


n>1 \deg f=n 





On the other hand, 


S MA) => MX -4 = 2 xla )=9(x V), 


deg f=1 


while for n > 2 


XO AD= SO xan) qr va) X x(an) = 0, 


deg f=n Q1, An Qn 
by the orthogonality relations (theorem 7.A.5). We deduce that 
L(T) =1+9(x, V)T 


and the result follows immediately from this and the last equality of the pre- 
vious paragraph. LJ 


444 Chapter 9. A Little Introduction to Algebraic Number Theory 


We end this section by stating a very deep result of Dwork, a consequence 
of his proof of the rationality of the zeta function of algebraic varieties. It is 
a vast and very difficult generalization of the Davenport-Hasse relation. 


Theorem 9.A.15. Let f,g € F,[X] and let x, p be multiplicative, respectively 
additive characters of F4. If 


Sn = So x (Neam (EE) «(Teepe /#_(9(2))) 


rEF gn 


then there exist polynomials P,Q € C[T] such that P(0) = Q(0) = 1 and 


Ne” 


Sn P(T 
2a | =., 
exp 2 > T) 


9.A.4 Diagonal equations and a theorem of Weil 


Using almost everything we have done so far, we can prove the following 
beautiful theorem of Weil. The true beauty of the result will be revealed in 
a next section, when we will use this result and the Davenport-Hasse relation 
to compute the zeta function of a diagonal hypersurface. Before stating the 
theorem, we need some notation. Let ao,a1,...,a; € Fj and let xo, x1,---,X1 
be multiplicative characters of Fj. Consider the additive character y(x) = 

217 
ep l¥q/Fr*) and let g(xi) = 9(xi, Y). Finally, define 

g(xo) g(x) gx) 
xo(ao) xı(aı) xı (a) 
Note that this quantity depends on the a,;’s, but we suppress the dependence 
from the notation, as we will consider the a;’s as fixed elements, while the 
characters x; will vary. We are now ready to state and prove: 


Theorem 9.A.16. (Weil) Let ao,ai,...,a, € Fj and let X be the projective 
variety defined by aoz + aiz? +--+ ax =0. Then 











Wa(xo;X1;- . , X1) = 


l—1 
. Il 
|X (Fq)| = ) q +- ) | W(Xo,X1;---3 X1), 
j=0 q XOsX LyX 


9.A. Equations over Finite Fields 445 


the sum being taken over all nontrivial characters x; of Fy such that x;" = 1 
and xox1i-**X1 = 1. 
Proof. Let M be the number of solutions (£0, 71,..., 21) € Fit of the equation 


agry’ + +--+ az)" = 0. Since we work with a projective variety, we have 


|X (F,)| = “Ma, so it remains to find M. Note that 


M = ) Lapa +---+ayr=0 


XLO,--,LLEFg 
= ) la-u=0 X lem=uo nenn lrm=u 
UQ,» „wu EFq TO, Tl 
m 
= S| N(x = uo) aaa N(x = u), 
a-u=0 


where we wrote for simplicity au = 0 for aouo + au +---+a;u; = 0. Using 
proposition 9.A.8, then expanding the product and re-arranging terms, we 
obtain 


l 
M = ` ll ` xj(ujz) | = ` (£ oltua) oau). 


a-u=0 j=0 xj =l x7 =1,VO0<SJ<I a-u=0 


Next, note that 


>= xo(uo)xa(u1) = Lbs ~* Jo(xos-+++X1)s 


a-u=0 


where 


Jo(Xo, X1,- -3 X1) = ` xo(uo)X1 (u1): +: x(u). 
uo tu + +u, =0 


So, we end up with the pretty complicated formula 


M = `S Tut aj) -Jo (X0, -<+ X1). 


xo = =x; =1 j=0 


446 Chapter 9. A Little Introduction to Algebraic Number Theory 


It is convenient to study in more detail these sums Jo(xo0, X1,---,X1); 
which are generalizations of the Jacobi sums discussed in section 9.A.3. The 
following lemma deals with those terms in the sum defining M for which some 
character is trivial. 


Lemma 9.A.17. Suppose that the trivial character appears in the list 
X0;X1;---,X1- Then either x; = 1 for all j, in which case 


Jo(xo, X1, ---,X1) =, or Jo(xo,X15---x1) = 0. 


Proof. If all x; = 1, it is clear that Jo(xo, X1,---,X1) = q (don’t forget that 
the trivial character evaluated at 0 yields 1 by convention). Assume that not 
all x; are trivial, say (without loss of generality) yo = x1 =+- = Xk = l and 
x; #1 for j > k. Then 


Jo(X0;X15-+++ Xt) 
= ` Xk+1(Uk+1) ++ Xi (u) 


ug +u, + +u =0 


=q" XO xri (ues) + xa (u) 


Ukl Ul 
l 
=4¢4 TT | > xwe 
j=k+1 uj 
= 0, 


the last equality being a consequence of the orthogonality relations (theorem 
T.A.5). LJ 


Using this, we obtain 
M=q7¢+ > Tu (a;)~'Jo(X0, X15 +++» X1). 
X; =1,x:#1 j=0 


The crucial step is the following lemma, whose proof uses a generalization of 
theorem 9.A.13. 


9.A. Equations over Finite Fields 447 


Lemma 9.A.18. If yo, v1, x2,---, xi are nontrivial characters and 
Xot X10 x #1, 
then Jo(xo, X1,---,X1) =0. If Xo: X1 Xr = 1, then 


Jo(xXo, X1; <- -3 X1) = ——9(x0)9(x1) “+ 9(X1)- 


Proof. Let Jo = Jo(xo, X1,- --, X1) and 
Jy = Si(xaX2,0-51) = > x1 (tr) x2(u2) +++ xu(w). 
ujtugt:-+uj=1 
Then 
Jo=S> X xo(t)x1 (21) ++» xlr) 


teks Ly++z,=—-t 


= ` xolt) (x1 e x(t 


teF* 


= (x1 X) (71) | SS xoxr- + xi(t) | Ji. 
teF* 
If xox1:-:x: # 1, we are done since reps XoX1°* X(t) = 0 in this case 
q 
(orthogonality relations). If yov1-:--x: = 1, the previous equality becomes 


Jo = xo(-1)(q — 1) 1. Since 9(x0)9(x9") = xo(—1)q (by corollary 9.4.12), it 
remains to prove that 


7, = glradglxa) 960) 
9(x1X2°°* Xt) 
Now, by definition we have 


9(x1)9(x2)°*-9(x1) = D> x1(x1)x2(x2) +++ Xi(wi) (a1 + 22 +-+- + z1) 


LY LQ yore TIl 


= Jo(x1, X2,- --, X1) + X v(t)xix2+++ xi(t) Jı 
t£0 


= Jo(X1, X2,- --, Xt) + 9(X1X2 0 Xt) Jı. 


448 Chapter 9. A Little Introduction to Algebraic Number Theory 


As x1xX2°°:x1 # 1, we have Jo(x1, X2,---,xX1) = 0 by the first part of the 
proposition and the conclusion follows. g 


Finally, using the previous lemma, we obtain 


l 
M =q + i dT] 0a)xs(a;)), 


XO: X1 J=O 


the sum being taken over all nontrivial characters x; such that x; = 1 and 
Xo:::xX1 = 1. The result follows. go 


9.A.5 The zeta function of an algebraic variety 


In essence, an affine variety over a field k is the locus in k™ of a bunch 
of polynomial equations in N variables with coefficients in k. A projective 
variety is the locus in the projective space P”(k) of a bunch of homogeneous 
polynomial equations in n + 1 variables and coefficients in k. If X is an 
algebraic variety over Fj, it is natural to consider the number of points of X 
over the various finite extensions Fyn. The zeta function of the variety X is 
(up to a convenient normalization) the generating function of the sequence 
obtained in this way, i.e. 


Zx(T) =exp| Š. AC om 


n>1 


Clearly, Zx(T) is a formal series in T with rational coefficients. 


Example 9.A.19. Consider finitely many polynomials fj, fo,..., fs in k vari- 
ables with coefficients in Fp, and let an be the number of solutions in Fp» of 
the system of equations 


filt,- -., k) = fo(ai,..-, 2k) = = fs(21,..., 2%) = 0. 


The zeta function of the affine variety defined by the polynomials f,,..., fs is 
exp (Erz aT"), On the other hand, if f;,..., fs are homogeneous polyno- 
mials , the zeta function of the projective variety defined by these polynomials 


9.A. Equations over Finite Fields 449 


is exp (Enz eT), as this time two solutions that differ by a nonzero 
element of F,. are the same element of the projective space. 


Remark 9.A.20. Suppose that 
|X(Fon)| = 27 +23 +--+ tz- ulu 


for all n > 1 and some complex numbers 2;, u; (as we will see, this always 
happens, but this is very difficult to prove). Then 


(1 — Tu,)(1 — T uz) e. (1 — Tut) 


Zx(T) = (1 —Tz)(1 —Tzg)---(1- Tzs) ` 


essentially by definition of Zx(T) and by the equality of formal series 


` m=- log(1 — aT). 
n 


n>l 
For instance, if X = P”, the projective space, then 
|X (Fgn)| = q7 tq? UP $e qe 1, 


SO 
r 


1 
Zx(T) = —. 
j=0 
Another trivial example is the variety defined by the equation ryz = 1 in 
three-dimensional affine space. Then clearly |X (F,)| = (q? — 1)?, so that 


(1 — qT)? 
Zx(T) = ———: 
= TTU- T) 
Let Z[[T]] be the set of formal series in T with integer coefficients. Note 
that in all previous examples we have Zx(T) € Z[|[T]]. It turns out that 
this is always the case. The following result discusses more generally when 


exp (Xo, T") € ZIT]. 


450 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proposition 9.A.21. Let an be a sequence of integers. There exist unique 
sequences of rational numbers bn and cn such that 


exp | X 7" | =14+ nT" = J] Q-7™). 
n>1 n n>1 n>1 


Moreover, all bp are integers if and only if all cn are integers, if and only ifn 
divides Žan u (5) aq for alln (where p is Mobius’ function). 


Proof. The existence and uniqueness of bp is clear. As for cn, the key point is 
to consider 


log [a — T” )” = Sen log(1 — T”) 


n>1 n>1 

T” 

-ZaD T 

n j21 
T” 
-DT (Za 
n>1 d|n 
Thus we need to find the sequence cn for which $` din dca = —a, for all n. But 


the Mobius inversion formula yields the explicit form 


B —] (2) 
d|n 


showing the existence and uniqueness of Cp. 

It is clear that if cn € Z, then bn € Z for all n. The converse is proved by 
induction. Since cı = —b,, we have cı € Z. Suppose that c1,...,Cn-1 € Zand 
observe that f = [];-,,(1 — TŻ)“ is invertible in Z[[T]], as its constant term is 
1. Thus 


(1 — T”) (1 — Trt )en+1 a 


9.A. Equations over Finite Fields 451 


Considering the coefficient of T” in the left-hand side of the previous equality, 
we obtain that c, € Z. Using also the explicit formula of the c,’s in terms of 
the a,,’s yields the last statement of the proposition and finishes the proof. LI 


Let X be an algebraic variety over F,, say defined by some polynomials 
fi, f2,---, fa in n variables with coefficients in F}. If 


£ = (£1,...,2n) E X (Fgm), 


define 
Fr(x) = (x3, £3,..., 29). 


It is easy to see that this is again an element of X(Fọm). Let f be the 
smallest positive integer such that x € X(F,s) and associate to x the cy- 
cle (x, Fr(z),...,Fr/~!(z)), of length f. Then X(F gm) is the disjoint union 
of the cycles of length dividing m, so if an is the number of cycles of length 
n, then |X (Fg~)| = > ajm d` aa. Combining this and the previous proposition 
yields 





Zx(T) = [J0 -T= € ZIT). 


n>1 


Remark 9.A.22. The previous proposition is powerful in other contexts, too. 
For instance, it immediately yields the equality of formal series 


exp(X) = [[ (1 - X”) 


n>1 


_ u(n) 
n 9 


where p is Mobius’ function. The proposition also easily implies the following 
equality 





xP ny- Hm) 
ep | oe} = TP ae 
n>0 n>1 
gcd(n,p)=1 
showing that the Artin-Hasse exponential exp (Enzo x) has coefħcients in 
Zp. This is absolutely not clear from the definition and plays a major role in p- 
adic analysis and also in Dwork’s proof of the rationality of the zeta functions 
attached to algebraic varieties over finite fields. 


452 Chapter 9. A Little Introduction to Algebraic Number Theory 


In general, it is a very deep problem to compute the zeta function of a 
given variety. Yet even without computing the zeta function, one can say a 
great deal of things about it! This was conjectured by Weil in the wonderful 
paper [84] and proved after a gigantic work by Deligne and Grothendieck. The 
following theorem is réally one of the most difficult and beautiful results of 
modern mathematics: 


Theorem 9.A.23. (Deligne-Grothendieck) Let X be a non-singular projective 
variety of dimension n over Fz. Then the zeta function of X is a rational 
function. More precisely, there are polynomials Po, P,,..., Po, € Z|T] such 
that 
Zx(T) = Ae) -++ Poni (T) 

o(T) P(T): Pan(T) 


and 
a) A(T) =1 -T and Pn(T)=1-¢@"T. 


b) We can write P; = BERE —wijT), where wij are algebraic integers such 


that |w;;| = g’/? for all i,j. 


c) Ifx = 522 (—1)*b:, then the zeta function satisfies the functional equa- 
tion 





Zx (z) = +(q?T)*Zx(T) 


for some sign +. 


If X is a smooth projective curve of genus g (this is an important invariant 
attached to curves; in the notations of the theorem, we have bı = 2g) over Fy, 


Weil proved in 1940 that its zeta function can be written in the form fete 


for some polynomial P € 1 + tZ/t]. Moreover, the roots of P have absolute 
value 1/,/q. So in this case bọ = b2 = 1, bı = 2g and moreover we can write 
|X (Fp) =1l+q" — (wi tug t+- + w3g) 


with |w;| = yq. In particular, we obtain the very nontrivial estimate 


|X (Fyn) — (1 +4")| < 2gq"/”. 


9.A. Equations over Finite Fields 453 


This last estimate was obtained by Lang and Weil in 1954 for arbitrary va- 
rieties, before the proof of the previous deep theorem. For an even more 
concrete example, consider integers a,b and a prime p > 3. The condition 
that the curve y? = f(x) be non-singular is that p does not divide 4a? + 276. 
In this case the curve y? = f(z) has genus 1 and one point at infinity. Hence 


|X (Fp)| = 1 + |{(2,y) € Fp x Fply* = z? + ax + b} 
and the bound above becomes 
H{(z, y) E F, x Foly? = r” + ax + b}| — p| < 2\/p. 


This reproves a famous theorem of Hasse (there are however easier proofs, but 
they require a good knowledge of the theory of elliptic curves and quite a lot 
of algebraic geometry). 

Dwork’s p-adic proof (1960) of the rationality of zeta functions works 
for affine or projective varieties, be they non-singular or not. For a non- 
singular projective hypersurface defined by a polynomial f € F,[X1,..., Xn], 
homogeneous of degree d, Dwork proved that its zeta function is of the form 


n+1 n n 
Ta for some P € 1+tZ[t] of degree eee) 


this in the next section, in the much easier case of diagonal hypersurfaces (a 
famous theorem of Weil). 


. We will prove 


9.A.6 Zeta function of diagonal hypersurfaces 


In this part, we show how to compute the zeta function of a diagonal 
hypersurface. This beautiful result, due to Weil was also the starting point of 
the famous Weil conjectures. 


Theorem 9.A.24. (Weil) Let | > 1, m|q — 1 and let X be the projective 
hypersurface of equation agxj' +aiz +--+ =0. There exists P € ZT] 


of degree d = (mD (m=) such that 
_ P(r) (=) 
a) Zx(T) = aaa 


1 


b) If P(z) =0, then 1/z is an algebraic integer of absolute value q7. 


454 Chapter 9. A Little Introduction to Algebraic Number Theory 


c) There exists an explicit integer x such that 
1 E l-1 _\X 


Proof. Recall that by theorem 9.A.16 we have for any q an equality 


. Il 
|X(Fy)| = dq’ + - ` W(xX0, X13- -3 X1), 


J=0 X0:X1s-Xl 


the sum being taken over all nontrivial characters x; of Fj such that x? = 
1 and yox1:--x1 = 1. The main point is to study how the numbers 
Wr (Xo,---,X1) vary and this is accomplished by the Davenport-Hasse re- 
lation. More precisely, recall that for a character x of Fj we have a character 
Xn(r) = X(N En /F,(Z)) and that x > Xn induces a bijection between charac- 
ters of Fj of a given order and characters of Fjn of the same order (proposition 
9.A.7). So, if S is the set of l + 1-tuples (vo,..., x1) of nontrivial characters 


of Fo such that yi” = 1 and xyox1-::: xi = 1, then 


l—1 
. 1 
XE) = +> X Wpe(xXom Xim- Xin): 
j=0 (XOX1e sr XLES 


On the other hand, by the Davenport-Hasse relation (theorem 9.4.14) we can 
write 


War (Xo,n, X1,n, s. Xl,n) = (—1) +! (—1) 0+) (Wa (x0, e.) x)”. 


We deduce that 


I-1 _4)l41 n 
XE =o —(-9 S (‘ 3 Wal0s-+-sx)) 
ES 


j=0 (XOX 1s XL) 





and so by the previous paragraph we finally obtain the first part of the theorem, 


with 

(—1)'*} 

P(T) = I] € — q Walor xT) . 
(xos xX )ES 





9.A. Equations over Finite Fields 455 


Note that W,(xo,---,x1) # 0 for all (xyo,..-, x1) € S, as the Gauss sum 
associated to a nontrivial character is nonzero. Thus deg(P) = |.S|. It remains 
to find this number. See |S| as a function f(l) of l. Let g(l) be the number 
of l + 1-tuples (xo,...,x7) of nontrivial characters such that yi” = 1 and 
Xot- xı #1. Clearly, g(l) = f(L +1). But f(l) + g(l) is just the number of 
tuples of nontrivial characters such that xj” = 1. As there are m— 1 nontrivial 
characters of order dividing m, we deduce that f(1) + g(l) = (m — 1)!+!. One 
immediately deduces that 


py) = PDH + (ym = 1 
— m , 
finishing therefore the computation of deg( P). 
By definition of W,(xo,.--,x:) and by the fact that |g(x:)| = vq, we 


deduce that |W,(vo,---,x1)| = qF. This yields part 2) of the theorem. 
Next, the fact that g(x IO) = xi(—1)q and yo--- xı = 1 implies that 
git} 


Wal xat,- X E) = =. 
alXo xr) Wa(X0,- +++ Xt) 


As clearly the set S is stable by inversion, we deduce that the map z > = is 


a permutation of the roots of P. From here, it is an easy but tedious exercise 
to deduce the third part of the theorem. 

Finally, it remains to prove that P € Z[T]. As Zx(T) € Q|T]], we 
must have P € Q|T]. We will prove that the coefficients of P are algebraic 
integers, which will be enough to conclude. It is enough (taking into account 
the definition of P) to check that W,(xo,---,x1)/q is an algebraic integer. As 
xi(a;) are roots of unity, it will therefore suffice to check that g(xo0)"900) is an 
algebraic integer. But this is an obvious consequence of lemma 9.A.18. This 
finally proves the theorem! O 





Addendum 9.B A Glimpse of Algebraic 
Number Theory 


This addendum recalls the basic properties of number fields. Of course, 
one would need a whole book (and actually much more...) to properly develop 
the theory of number fields, as even proving the basic properties requires a 
lot of commutative algebra. We will try to stay as elementary as possible, 
while still giving some proofs. We warn the reader that a long part of this 
addendum is very abstract. To see the power of the notions and theorems 
discussed, we advise the reader to start with the last part of the addendum, 
which discusses applications to problems with very elementary statements and 
very non-elementary solutions... 


9.B.1 Ideals and quotient rings 


Let R be a commutative ring. An ideal of R is a nonempty subset J of R 
which is stable under addition and such that ax € I for alla € R andz€ I. 
Note that this is far stronger than the stability of J under multiplication. It 
is fairly easy to construct ideals of R: if x1,272,....r%, € R then 


(21, T2, Lae Tn) = {a,x} free tf AnLn|a; E€ R} 


is obviously an ideal, called the ideal generated by z1. £2,..., £n. 

Once we have an ideal J in a ring R, we can naturally construct a quotient 
ring, whose elements are coset classes @ = a + I with a € R and addition, 
multiplication are defined by a+b = a +b and T-b = ab. It is an easy exercise 
to check that it is well-defined (the issue is that we may have @ = a’ even if 
a £ a’ and one needs to check that if @ = a’ and b= b’, then a +b = a' +b, 
similarly for multiplication). 








Definition 9.B.1. 1) An ideal J of R is called maximal if I # R and if I 
is not contained in any ideal different from J and R. 


2) An ideal J of R is called prime if I # Rand ab € R-I for any a,b € R-I. 


9.B. A Glimpse of Algebraic Number Theory 457 


There is a very nice characterization of prime and maximal ideals of a 
ring in terms of quotient rings. The proof is essentially trivial unwinding of 
definitions, but the result is crucial: 


Proposition 9.B.2. 1) An ideal I of R is maximal if and only if R/I is a 
field. 


2) An ideal I of R is prime if and only if R/I has no zero divisors. 


As a field has no zero divisors, this proposition implies that any maximal 
ideal is a prime ideal. There are however prime ideals which are not maximal: 
the ideal (2) in Z[X] is prime and not maximal, as the quotient ring is Fo[X], 
which has no nonzero zero divisors but is not a field. 

There are natural operations on ideals: if J, J are ideals of a ring R, one 
defines their sum J+ J = {i + jli € I,J € J}. It is easy to check that this is 
an ideal. The analogous definition for multiplication would fail (in general) to 
yield an ideal, so one defines the product of ideals J, J as the ideal generated 
by all products 7-7 with (7,7) € 1 x J. 


9.B.2 Field extensions 


We say that L is a field extension of K if both K, L are fields and K C L. 
The extension’ L/K is called finite if L is a finite dimensional K-vector space. 
In this case, we define the degree of the extension to be 


[L : K] = dimx (L). 


For instance, the extension C/R has degree 2, as 1,1 is a basis of C over R. 
On the other hand, the extension C/Q is infinite (for example, because C 
is uncountable and Q is countable). We will mostly be interested in finite 
extensions, for which the following result is of constant use: 


Proposition 9.B.3. Let L/K and M/L be finite extensions of fields. Then 
M/K 1s finite and 
[M:K]=|[M:JL]-[L: K]. 


‘This notation should not be confused with the quotient ring previously discussed, simply 
because K is not an ideal in L unless K = L. 


458 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proof. One can easily check that if (z;); is a basis of M as L-vector space 
and (yi); is a basis of L as K-vector space, then (ziy;)i; is a basis of M as 
K-vector space. go 


9.B.3 Algebraic numbers and algebraic integers 


If L/K is an extension of fields and if | € L, we say that l is algebraic 
over K if there is a nonzero polynomial f € K[X] such that f(l) = 0. In 
this case, there is a unique monic polynomial m; € K[X] of least degree which 
vanishes at l. It is called the minimal polynomial of l and it is irreducible in 
K[X], by minimality. The division algorithm shows that the only polynomials 
f € K[X] such that f(J) = 0 are the multiples of 7. Recall that K (l) is the 
smallest field containing K and l and it can also be described as 


KO = {Ig € Kix}. a0 #0} = Kil) = (O € KIX} 
To prove this equality, it suffices to show that if f € K[X] does not vanish 
at l, then FD is of the form A(l) for some A € K[X]. But since f(l) 4 0 
and 7) is irreducible, f and 7; are relatively prime, so there are polynomials 
A,B € K[X] such that Af + Ba, = 1. Evaluation at l yields the result. The 
following proposition is easy, but fundamental. 


Proposition 9.B.4. Let L/K be any extension of fields. 


1) Let 1 € L be algebraic over K. Then there is an isomorphism of K- 
algebras® between K(l) and K[X]/(m). Moreover, [K(l) : K] = deg m, < 
OO. 


2) Conversely, if | € L and [K(l) : K] < co, then L is algebraic over K. 


Proof. 1) Consider the map sending f € K[X] to f(D). It is a map of 
K-algebras, vanishing precisely on the ideal (7), by definition of m. 
It is easy to check that it induces an isomorphism between K|X]/(m,) 
and Kl], obtained by sending f + (7) to f(D). Next, if d = degr, 


8This means an isomorphism of rings which is K-linear. 


9.B. A Glimpse of Algebraic Number Theory 459 


then (the classes of) 1, X,...,X%! form a K-basis of K[X]/(m) and 
so [K[X]/(m,) : K] = d. Combining this with the previous isomorphism 
finishes the proof. 


2) This is clear: if d = [K(l) : K], then 1,1,/?,...,1¢ € K(l) cannot be 
linearly independent over K. This forces a nonzero polynomial equation 
with coefficients in K satisfied by l and so l is algebraic over K. 

O 


Combining the previous results, we can now prove the following nontrivial 
result (which can also be obtained using the theorem on symmetric polynomi- 
als): 


Theorem 9.B.5. Let L/K be any field extension. The set of elements of L 
which are algebraic over K forms a subfield of L. This subfield is equal to L 
if L/K is finite. 


Proof. If l,,l2 € L are algebraic over K, then by proposition 9.B.4 K(L)/K 
and K(li)(l2)/K(l) are finite extensions (note that lz is also algebraic over 
K(l,)). Thus by proposition 9.B.3, K(l,)(l2)/K is finite. But K(I,)(l2) con- 
tains K(lı+l2), K(Ll2) and K(l,/l2). We deduce that if x € {l)+l2, ile, l1 /l2}, 
then K(x)/K is finite and the result follows by proposition 9.B.4. The second 
part is also a trivial consequence of proposition 9.B.4. LJ 


Definition 9.B.6. 1) A number z € C is called algebraic if it is algebraic 
over Q. It is called an algebraic integer if its minimal polynomial over Q 
has integer coefficients. By Gauss’ lemma, this is equivalent to the fact 
that z is root of some monic polynomial with integer coefficients. 


2) We denote by Q (respectively Z) the set of algebraic numbers (respec- 
tively algebraic integers). 


The following result is an easy consequence of the theorem of symmetric 
polynomials 9.10. 


Theorem 9.B.7. Q is an algebraically closed field and Z isa ring. For any 
x E€ Q there exists n > 1 such that nz € Z. 


460 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proof. The previous theorem shows that Q is a field. Suppose that x € C 
satisfies £” + an_1z™ | + --- + ao = 0 for some a; € Q. We want to prove 
that r € Q. Let al! ) be the conjugates of a, (i.e. the roots of the minimal 
polynomial of ax, including ag). The theorem of symmetric polynomials easily 
implies that 


f — I] (x” 4 alr) xn] dee ako) 
ko,- kn-1 


has rational coefficients and vanishes at x, from where the result follows. To 
prove that Z is a ring, consider for instance z, y € Z and let z1,...,£n and 
Y1, Y2,...-,Ym be all roots of the minimal polynomials of z and y. Another 
application of the theorem of symmetric polynomials shows that I; (X — 
Ti — Yj), respectively L; (X — zi : yj) have integer coefficients and vanish at 
z + y, respectively x - y. Finally, to prove the last statement, take z € Q and 
choose integers ao, @1,...,@n such that anz” + an-1z"”7! + --- + ao = 0 and 
an Æ 0. Then 


(anz)” + An—1(Anx)” ' +--+ agar! = 0, 
so an TEZ. O 


Let us introduce now the main object of this addendum. 


Definition 9.B.8. A number field is a finite extension of Q. If K is a number 
field, then we let Ox = KAZ. 


Example 9.B.9. 1) Let d # +1 be a squarefree integer and consider K = 
Q(vd) (as usual, if d < 0, Vd = iy —d). Then K has degree 2 over Q, 
but the structure of Og depends on the residue class modulo 4 of d: if 


d = 1 (mod 4), then Ox = Z [4], while if d = 2 or 3 mod 4, then 


OK = Z|V dj. This is not difficult to prove: imagine that x € Ox and 
write z = a+bvd. If b = 0, then a must be an integer (since it is rational 
and algebraic integer), so we are done. Otherwise, the conjugates of x 
are x and y = a — bvd. Thus 2a and a? — db? must be integers, which 
easily implies the desired result. 





9.B. A Glimpse of Algebraic Number Theory 461 


2) Let Çn be a primitive nth root of unity in C and consider K = Q(G,). 
The irreducibility of cyclotomic polynomials (a fairly nontrivial theorem) 
implies that the nth cyclotomic polynomial is the minimal polynomial 
of Cn, so that K has degree y(n) over Q. One can prove with quite a 
lot of effort that Ox = Z[C,], i.e. there are no algebraic integers in K 
except for the obvious ones. 


9.B.4 Factorization in Ox, the fundamental theorems 


Since the proofs of the theorems stated in this section are rather long and 
technical, we will simply state them without proof, referring the reader to any 
basic number theory book. We prefer to focus on their arithmetic applications. 
Let K be a number field of degree d over Q and let Ox be the subring of K 
consisting of algebraic integers. 


Theorem 9.B.10. If I is a nonzero ideal of Ox, then Ox/I is a finite ring. 


Remark 9.B.11. Actually, one can prove that if [K : Q] = d, then there exist 
T1, Z2,...,Zqa E€ Ox such that the map Zt > Ox sending (N1, N2,..., Na) to 
Nı£ı +---+NgZq is a bijection. This easily implies the previous theorem: let 
x € I be nonzero, then the norm n of z is again in J and is nonzero. Hence 
I contains nOg. But the previous result shows that Ox /nOx is in bijection 
with Z¢/nZ‘, which is finite, with |n|? elements. 


Corollary 9.B.12. If p is a nonzero prime ideal, then Ox / is a finite field 
and so p is a mazimal ideal. 


Proof. The ring R = Ox/ g is finite with no zero divisors. Let z # 0 be any 
element of R. As R is finite, there must be i < j such that zê = x. Then 
z'(zJ-* — 1) = 0 and as there are no zero divisors in R, we obtain 2/~* = 1. 
Thus z is a unit. This proves that R is a field and the result follows. LJ 


Definition 9.B.13. If I is a nonzero ideal of J, we define its norm 
N(I) = |Ox/I|. 


This is an integer which lies in J, by Lagrange’s theorem in the group Ox /I. 


462 Chapter 9. A Little Introduction to Algebraic Number Theory 


We cannot emphasize enough the importance of the following result for 
the theory of algebraic number fields. Suffice it to say that it plays exactly the 
same role as the fundamental theorem of arithmetic (i.e. the unique factor- 
ization theorem) and we will leave the reader recall that basically all results 
of elementary number theory follow from it. 


Theorem 9.B.14. (Kummer, Dedekind) Let K be a number field. 


a) Any ideal of Ox, different from 0 and Ox, can be uniquely (up to per- 
mutation) written in the form pi p2 On for some n > 1 and some 
prime ideals pi, not necessarily distinct. 


b) If I and J are two nonzero ideals of Ox, then N(I- J) = N(I)- N(J). 
Moreover, I C J if and only if there exists an ideal J’ such that I = J-J’. 


Remark 9.B.15. One can also prove that for all z € Ox we have 
N(zOx) = |N(z)|. 


Remark 9.B.16. Suppose that K has degree d over Q and that p is a rational 


prime. Let pOx = pf! - 952 --+++ py be the factorization of the ideal pOx. 
As pOx has norm pf (by the first part of the remark), theorem 9.B.14 yields 
pt = N(p1)& +--+: N(g)°9 and so there are nonnegative integers fi such that 


N(g;) = p}. We have the fundamental relation 


ei fi teafet:::+egfg = [K : Q]. 


Remark 9.B.17. Let K be a number field, let p be a nonzero prime ideal of 
Ox and let x € Ox be prime to p (i.e. go does not appear in the prime 
factorization of rOx or, equivalently, TOK + p = Ox). Since Ox /p is a field 
with N (p) elements, we obtain from Lagrange’s theorem the following useful 
analogue of Fermat’s little theorem: z™(®)-1 =1 (mod p). 


Consider now a finite extension L/K of number fields and let p be a 
nonzero prime ideal of Ox. Let 


pOL = {a£ + a2T2 +++ + anTn|n > l,a; € p, zi € OLY, 


9.B. A Glimpse of Algebraic Number Theory 463 


which is easily seen to be a nonzero ideal of Oz. Moreover, pOL Æ OL, as 
otherwise we would have 1 = azı + +: + an£n for some a; € p and some 
x; E Oy and by taking norms, we would get 1 € g. Hence, by the previous 
theorem, there exists a prime 8 of Oz dividing pOr. By the same theorem, 
this simply means that ¢ C 8. Note that 8 is not unique, but there are only 
finitely many possibilities for it. On the other hand, given a nonzero prime 
ideal 6 of OL, there is a unique prime g of Ox such that o C p. Indeed, the 
existence is obtained by taking p = BNOx (it is an easy exercise for the reader 
to check that this is indeed a nonzero prime ideal of Ox) and the uniqueness 
follows from the fact that any nonzero prime ideal of Ox is maximal (hence, if 
pı C B and p2 C p for some different primes g; of Ox, then 1 E€ 91 + p2 C B, 
a contradiction). | 


Definition 9.B.18. Let L/K be a finite extension of number fields and let p 
be a nonzero prime ideal of Ox. A prime ideal 8 of Oy is said to lie above g 
if g C B. We also write 8| in this case. 


We can resume the previous discussion by saying that any prime £ of OL 
has a unique prime p of Ox below it, namely 8M Ox and any prime g of 
Ox has at least one (but finitely many) prime 8 above it. Note that if Bly, 
then the inclusion Ox C Oy induces an injective morphism Ox /g > OL/B, 
realizing therefore O,/( as a finite extension of Ox /@. 


Definition 9.B.19. Let |p be as above. 
1) The residual degree of 3/p is defined by f(6/p) = [O,/8 : Ox /g]. 


2) The ramification index of G/g is the exponent e((@/) of @ in the prime 
factorization of pOL. 


Note that by definition we have 
pOL = | [ 6°°°/”. 
Ble 


Using this and proposition 9.B.3, we easily obtain the following useful property 
of the ramification index and residual degree in a tower of number fields. 


464 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proposition 9.B.20. Let M/L/K be a tower of finite extensions of number 
fields and let p|B\p be primes of Oy, OL and Ox. Then 


e(p/B)-e(B/@) =e(p/@), fle/B)- f(B/p) = f(e/@). 


Definition 9.B.21. Let L/K be a finite extension of number fields. A prime 
p of Ox is called 


a) unramified in L if e(G/g) = 1 for all Bl in L. 


b) totally split in L if e(8/o) = 1 and f(8/g) = 1 for all Blo in L. This 
is equivalent? to: pO; is the product of [L : K] different prime ideals of 
OL. 


The following theorem gives a practical way to factor a prime in a number 
field. The precise statement is a bit complicated, but the message is very 
simple: for most primes p, factoring pOx in Ox comes down to factoring 
f € F |X], where f is the minimal polynomial of any primitive element of K 
that lives in Ox. 


Theorem 9.B.22. (Dedekind, Kummer) Let K = Q(x) be a number field, 
where x € Ox has minimal polynomial f. Let p be a prime which does not 


divide!” |Ox /Z|zx]| and let 
g 
f= [7° 


1=1 


be the prime factorization of f € Fp|X]. Then 
g 

pOK = I] 95", 
i=l 


where pi = pOx + fi(x)Ox are different prime ideals and N (pi) = pre fi, 


°We have Dale €(8/@): £(8/@) = [L : K], by an argument similar to that used in remark 
9.B.16. 
10Note that the result stated in remark 9.B.11 shows that Ox /Z|z] is a finite set. 


9.B. A Glimpse of Algebraic Number Theory 465 


Proof. Lift arbitrarily f; to some monic polynomials fi € Z[X]. The key 
point is to prove that the natural map Z[X|/(p, fi) > Z[z|/(p, fi(xz)) sending 
(the class of) h to (the class of) h(x) is an isomorphism of rings. Assume 
for a moment that we proved this. Let a = |Ox/Z[z]|, so gcd(p,a) = 1 
and aOx C Z[z]. A standard argument using Bézout’s lemma shows that 
Ox = pOx + Z[z], thus Ox/p; is naturally isomorphic to Z[z]|/(p, fi(z)). 
Since Z[X]/(p, f;(X)) is clearly isomorphic to F,[X]/(f;), which is a field with 
pes fi elements, it follows from the previous discussion that the g; are different 
prime ideals with N(g;) = p?8f:. Since pf C pOx + fi(z)*Ox, f(z) = 0 
and f = ||}; f (mod pZ[X]}), it follows that []?_, pft C pOx and so by 
theorem 9.B.14 we have pOK| []?_, p. So, if pOx = [[Z_, p}, we must have 
si < ei. We conclude by noting that both sums J`, e; - deg f; and Y`; si- deg fi 
are equal to [K : Q] (see remark 9.B.16), so we must have s; = e; for all i. 
Let us prove now that Z[X]/(p, fi) > Z[x]/(p, fi(x)) is bijective. Since 
surjectivity is clear, it remains to prove the following assertion: if h € Z[X] 
satisfies h(x) € pZ|x] + fi(x)Z[z], then h € pZ[X] + fi(X)-Z[X]. Write 
h(x) = pA(x) + f;(x)B(x) for some A,B € Z[X]. Since f is the minimal 
polynomial of x, there is r € Z[X] such that h = pA + fiıB +r f. It suffices to 
use again that f € pZ[X] + fi- Z[X] to finish the proof. O 


Remark 9.B.23. It is not very difficult to prove that if p does not divide the 
discriminant of f, then p does not divide |Ox/Z[|z]|, so the theorem can be 
applied. 


9.B.5 Two classical examples 


Consider a squarefree integer d # +1 and let K = Q(Vd). We saw that 
Ox = Zr], where z = livid if d = 1 (mod 4) and z = Vd otherwise. The 
minimal polynomial of x is X? — X + i in the first case and X? — d in the 
second case. Theorem 9.B.22 shows that in order to understand the prime 
factorization of pOg, we need to understand the prime factorization of these 
polynomials modulo p. Since a quadratic polynomial modulo p is irreducible 
if and only if it has a root in F,, we easily deduce the following 


466 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proposition 9.B.24. Let d Æ +1 be a squarefree integer and let K = Q(Vad). 
Let p be a prime. 


a) If p > 2, then pOx is a prime ideal if (2) = —]l, a product of two 


different prime ideals if (2) = ] and the square of a prime ideal if pld. 


b) Ifp = 2, then pOx is the square of a prime ideal if d = 2,6,3,7 (mod 8), 
a prime ideal if d = 5 (mod 8) and a product of two different prime ideals 
ifd=1 (mod 8). 


Consider now n > 1 and let K = Q(Cn), where Cn is a primitive nth 
root of unity. As we have already said, it is rather difficult to prove that 
Ox = Z{C¢,] and we will take this for granted (actually, it is easier to use 
remark 9.B.23). In this case, theorem 9.B.22 reduces the prime factorization 
of pOx to that of ¢, € F, |X], where ¢, is the nth cyclotomic polynomial 
(whose roots are precisely the primitive nth roots of unity). Assume that p 
does not divide n and let g be an irreducible factor of degree f of dn € F,[X]. 
Let x be a root of g in an algebraic closure of F,. Theorem 9.A.4 shows that 

= nan (X — zx”). Let d be the order of p modulo n. Since n is the least 
positive power of x equal to 1 (because x is a root of ¢,), it follows that the 
sequence x? is periodic with period d, so we must have f = d. That is, dy 
factors mod p as a product of irreducible polynomials of degree d, the order of 
p modulo n. Since X” — 1 is squarefree modulo p (because the hypothesis that 
p does not divide n implies that X” — 1 is relatively prime to its derivative), 
we deduce that in theorem 9.B.22 we have e1 = e2 =--- = eg = l and f; = d 
for all i, so g = en) Assume now that pln and let n = = pr m for some m 
relatively prime to p. It is easy to see that $,(X) = @n(X p" )/Om(X pe '), $0 
modulo p we have ọn = oP, -P*—! This reduces the problem of factoring pOK 
to the previous case. All in all, we have the following useful result: 


Proposition 9.B.25. Let n be an integer greater than 1, let K = Q(C,) and 
let p be a prime. 


(n) 


dG modny different prime 


a) If gcd(n, p) = 1, then pOx 1s a product of 
ideals, each of degree ord(p mod n). 


9.B. A Glimpse of Algebraic Number Theory 467 


l 


b) If pin andn = p*-m with gcd(m, p) = 1, then pOx = (1°: .(g)P TP , 


p(m) 


ord(p mod mj nd each pi is of degree ord(p mod m). 


where s = 


9.B.6 The primitive element theorem and embeddings of 
number fields 


When working with subfields of C, the following result is very handy. We 
will use it constantly to shorten proofs of results which actually hold in much 
greater generality. 


Theorem 9.B.26. (primitive element theorem) Let L/K be a finite extension 
of subfields of C. Then there exists l € L such that L = K(1). 


Proof. As L/K is finite, there are elements 7,...,2, E€ L such that L = 
K(21,22,...,2n) (for instance, the elements of a K-basis of L over K). Thus, 
by induction on n it is enough to prove that if x,y are algebraic over K, then 
there exists 1 € L = K(z,y) such that L = K(1). 

Let f,g € K[X] be the minimal polynomials of x,y respectively and let 
Z1 = T, T2,...,Zn and Yı = Y, Y2,...,Ym be their roots. Clearly, there exists 
c E€ K such that x; + cy; A x + cy for all (7,7) Æ (1,1) (each of the previous 
linear equations in c has at most one solution in K and K is infinite). We 
claim that | = x + cy works. Clearly K(l) C L, so it is enough to check 
that x,y € K(l). As l = x + cy, it is enough to do it for y. Now, we 
know f(l — cy) = 0, so the polynomial f(l — cX) € K(D[X] has a common 
root y with g. But by construction this is the only common root of these 
polynomials. Moreover, it is a simple root, as g is irreducible over Q, so it has 
simple roots. Finally, we conclude that the greatest common divisor of these 
two polynomials is X — y. As these two polynomials have coefficients in K (1), 
so does their greatest common divisor and so y € K(l). The result follows. O 


We will use the primitive element theorem to prove some basic results on 
the structure of number fields. The first one concerns the embeddings of a 
number field in C. Note that Q(/2) has two such embeddings, namely the 
identity map and a+ bV? = a — bv2. It turns out that a number field with 
degree d has exactly d embeddings in C. 


468 Chapter 9. A Little Introduction to Algebraic Number Theory 


Theorem 9.B.27. Let L/K be an extension of number fields (L/K is au- 
tomatically finite). Then any embedding K — C extends to exactly |L : K] 
embeddings L > C. 


Proof. Fix an embedding o : K — C and use the primitive element theorem 
to write L = K(l) for some l € L. Then l is algebraic over K, of degree 
d = [L : K], with minimal polynomial f € K[X]. Suppose that o’ : L —> C 
is an embedding that extends ø, so o'(x) = o(x) for all x € K. Thus, if 
g € K[X], then o’(g(l)) = g?(a'(L)) (where g7 is the polynomial obtained from 
g by applying ø to its coefficients) and so o’ is determined by o’(1), which has 
to be a root of f7 (use the previous equality with g = f). Conversely, if 1’ is 
a root of f7, we can define an embedding o’(g(l)) = g7 (l') for all g € K[X], 
well-defined because any two polynomials g and h with g(l) = h(l) differ by a 
multiple of f. Thus the embeddings of L into C that extend ø are in bijection 
with roots of f7 and the result follows. E 


Taking K = Q in the previous theorem, we deduce that any number field 
L of degree d over Q embeds in exactly d ways in C. 


9.B.7 A bit of Galois theory 


Let L, M be two extensions of a field K and suppose for simplicity that 
they are contained in C. A K-morphism L —> M is a K-linear map from L to 
M which is also a ring homomorphism. Stated differently (but equivalently), 
it is an additive and multiplicative map f : L — M such that f(x) = z for all 
creEK. 


Definition 9.B.28. Let f € K[X]. The splitting field of f is the field 
K(2x1,22,...,2n), where x; € C are the roots of f. 


Note that a K-morphism f : K(2,...,2,) > M (where M/K is an 
extension) is uniquely determined by the values f(z;), as any element of 
K(z1,..., £n) is a polynomial with coefficients in K in the z;’s. However, 
f(x;) cannot be any element of M, since if P € K[X] kills z;, then P also 
kills f(z;): note that P(f(z;)) = f(P(z;)) = 0, as f is K-linear and multi- 
plicative. Moreover, there might be algebraic relations with coefficients in K 


9.B. A Glimpse of Algebraic Number Theory 469 


between the z;’s and the same argument shows that the numbers f(z;) must 
satisfy these relations. Thus, it is a fairly delicate issue to understand these 
K-morphisms. This is the content of Galois theory. Let us make an important 
definition first: 


Definition 9.B.29. Let L be an extension of a field K. The Galois group of 
L over K, denoted Gal(L/K) is the set of bijective K-morphisms f : L > L. 


We can now prove one basic result of Galois theory, which is also of 
constant use: 


Theorem 9.B.30. Let L/K be an extension of number fields. Then 
Gal(L/K)| < [L : K], 
with equality if and only if L is the splitting field of some polynomial f € K[X]. 


Proof. Let L = K(l) for some l € L. Each element o € Gal(L/K) is uniquely 
determined by o(1), which must be a conjugate of l. Thus we have a natural 
injection of Gal(L/K) in the set of conjugates of l, which has [L : K] elements. 
The inequality follows. 

Suppose that we have equality and let f € K|X] be the minimal poly- 
nomial of l. We will prove that L is the splitting field of f. It is enough 
to prove that if x is a conjugate of l, then z € L. But the first paragraph 
and the equality case implies that if 1,,...,l, are the conjugates of l, then 
{a(l)|o € Gal(L/K)} = {l,,...,ln}. Thus there exists o € Gal(Z/K) such 
that x = a(l). Since o(L) C L, we have x € L and we are done. 

Conversely, suppose that L is the splitting field of some f € K[X], with 
roots 21,22,...,X%n. Theorem 9.B.27 applied to the natural inclusion map 
K > C shows that there are [L : K] K-linear morphisms of rings L > C. 
But for any such morphism ø : L — C we have o(z;) € L, as o(2;) is just 
some root of f and L contains all these roots. Thus the image of any such 
morphism is a subset of L, i.e. any such morphism is an element of Gal(L/K). 
The result follows. E 


Definition 9.B.31. We say that a finite extension L/K of number fields is 
Galois if |Gal(L/K)| = [L : K], or, equivalently, if L is the splitting field of a 
polynomial with coefficients in K. 


470 Chapter 9. A Little Introduction to Algebraic Number Theory 


We are now able to prove the main theorem of Galois theory for number 
fields. The result holds in much greater generality, but we will not need it 
and we prefer to use all the extra data in order to shorten the proof rather 
dramatically. 


Theorem 9.B.32. Let L/K be a finite Galois extension of number fields. 
Sending a subfield M of L containing K to Gal(L/M) yields a bijection between 
subfields of L which contain K and subgroups of Gal(L/K). The inverse of 
this bijection is the map sending the subgroup H of Gal(L/K) to 


L” := {z € Llo(z) = 2,Vo € H}. 
Moreover, we have Hı C Ho if and only if L™ contains L™?. 


Proof. Let us prove first that LG@(4/M) — M for any intermediate field M 
between L and K. Note that L/M is again finite and Galois (as L is the 
splitting field of a polynomial with coefficients in K, thus also in M). Write 
L = M(l) for some primitive element l € L, with conjugates l4, l2,...,ln over 
M. Suppose that z € LC®(Ł/M) is not in M. Thus, we can find f € M[X] 
nonconstant of degree less than n and such that x = f (l1). We saw in the proof 
of the previous theorem that {a(l)|o € Gal(L/M)} = {l,...,ln}. So, for any 
i we can find o; € Gal(L/M) such that o;(l) = l;i. Then z = oj(x) = f(i). 
Hence f(l1) = f(l2) =--- = f(ln), contradicting the fact that f is nonconstant 
of degree less than n. The result follows. 

Next, we need to prove that if H is any subgroup of Gal(L/K’), then 
Gal(L/L?) = H. By the very definition of L” we have an inclusion H C 
Gal(L/L"). It is thus enough to check that |Gal(L/L)| < |H] and, using the 
previous theorem, it is enough to check that [L : L”] < |H|. Write L = L” (1) 
for some | € L. Consider f(X) = [[,¢4(X — a(l)). This is a polynomial of 
degree |H| whose coefficients are clearly in L”. Hence l has degree at most 
|H| over L” and the result follows. 

It remains to check that Hı c Hə if and only if L™! contains L™®?. It is 
clear that if Hı c Ho, then L™ contains L®? (any element of L fixed by Ho 
is also fixed by Hı). Assume that L”™ contains L¥#?, so L A L#2 = £2, 
But it is clear that the left-hand side of this equality is simply L”, where H 


9.B. A Glimpse of Algebraic Number Theory 471 


is the subgroup of Gal(L/K) generated by Hı and H3. Hence L” = L”? and, 
as we have seen, this forces H = Hə and so Hı C Ao. O 


Remark 9.B.33. Using similar arguments, it is not difficult to show that H is 
normal in Gal(L/K) (i.e. gHg~! = H for any g € Gal(L/K)) if and only if 
LĦ/K is Galois. 


9.B.8 Prime factorization in a Galois extension 


The results in this section will be crucially used in the applications that 
will be presented at the end of this addendum. They are absolutely funda- 
mental in algebraic number theory. 


Theorem 9.B.34. Let L/K be a Galois extension of number fields, with 
G = Gal(L/K). Let p be a nonzero prime ideal of Ox and let By and Bo be 
two prime ideals of Or which lie over p. Then there exists o € G such that"! 


Bz = o (p1). 

Proof. Suppose that 6z # o(6ı) for all o € G. Using the general version!’ 
of the Chinese Remainder Theorem, we obtain the existence of a € B2 such 
that a ¢ o(ßı) for alla € G. Then | [| egala) € Ox N bz = p C fi and this 
contradicts the fact that 3) is a prime ideal and a ¢ o7! (61) for any o € G. 
The result follows. E 


Corollary 9.B.35. Let L/K be a Galois ertension of number fields and let 
p be a nonzero prime ideal of Ox. Then all primes above p have the same 
ramification index and the same residual degree. 


Proof. For the residual degrees, note that any ø € G induces a bijection be- 
tween Oz/8 and O,/a(8), so these two sets have the same number of elements. 
We conclude by the previous theorem. Next, if e is a positive integer such that 


1l Note that by definition o(8;) is the set of all a(x), with x € 81. It is easy to check that 
this is again a prime ideal of Oz and that 0(81) NOx = Q. 

"This is stated as follows: let A be a commutative ring and let I), I2,..., In be ideals 
such that J; + J; = A for all i # j (which is satisfied if J; are different maximal ideals of A, 
for instance). Then for any 11,...,%n € A there exists x € A such that x — zx; € J; for all i. 


472 Chapter 9. A Little Introduction to Algebraic Number Theory 


B° divides pOL, then o(p)° divides o(pOL) = pOL. The result follows again 
easily from the previous theorem. E 


Definition 9.B.36. Let L/K be a Galois extension of number fields, with 
Galois group G. Let 8 be a prime of Oz. The decomposition group of £ is 


Dg = {0 € Glo(8) = By. 


Note that Dg is indeed a group and that any ø € Dg induces an auto- 
morphism ø : O,/8 — O,/8, which is trivial on Ox /p for p = BN Ok. 
Hence, we have a natural map Dg — Gal((Oz,/B8)/(Ox/)). Note that 
Gal((Oz,/2)/(Ox/@)) is a cyclic group, generated by the automorphism 
z => zN), since (O,/B)/(Ox/p) is a finite extension of finite fields (see 
the addendum 9.A for the structure of finite fields). The following result is 
rather tricky, but fundamental. 


Theorem 9.B.37. With the previous notations, the map 


Dg > Gal((OL/8)/(Ox/p)) 


is surjective. 


Proof. We may assume that O,/8 # Ox/, as otherwise the statement is 
trivial. Let a be a generator of the cyclic group (O,/8)*, so that clearly 
O,/B = (Ox/p){a]. Using the general form of the Chinese Remainder The- 
orem (see the proof of theorem 9.B.34), we can find a € Oy such that a 
(mod 8) = a anda € p’ for any 8' # B above p. Let F be the minimal 
polynomial of a over K. Then F € Ox[X] and F(a) = 0, hence F(a) = 0 
in O,/6. But then!’ F(a%(©)) = 0, so that F(a%‘#)) € B. So, we can find a 
conjugate b of a so that b — a%(®) € 8. It is not difficult!4 to see that there 


'SRecall that if F} is the field with q elements, then for all f € F,[X] we have f(X%) = 
f(X}. so if a is a root of f in some extension of F,, then a? is also a root of f. 

Since L/K is Galois, L contains all conjugates of a over K, i.e. all roots of F. Let M 
be the field generated over K by these conjugates, i.e. the splitting field of K. Then M/K 
is a Galois subextension of L/K, thus any element of Gal(M/K) extends to an element of 
Gal(L/K). But the isomorphisms K[X]/(F) — M sending X to a and b respectively yield 
o € Gal(M/K) such that o(a) = b. 


9.B. A Glimpse of Algebraic Number Theory 473 


exists ø € Gal(L/K) such that o(a) = b. We claim that o is in Dg and that 
it induces the automorphism z > 2% (9) of OL /B. First, if o(B) # p, then 
o~'(B) # B and so by our choice of a we have a € o7! (8), i.e. o(a) € B. But 
o(a) — aN (©) € B, forcing a) € B and then a € B, a contradiction with a 
(mod 8) =a #0. Thus o € Dg. The automorphism induced by ø on O,/6 
is uniquely determined by its action on & (generator of (O,/()*) and by our 
choice this action is a‘), The result follows. E 


Let us fix a prime £ above g in L. By theorem 9.B.34, all primes above p 
are of the form o(8) with o € Gal(L/K), i.e. Gal(L/K) permutes transitively 
the different primes ĝ1,..., 68g over ø. Since there are precisely |Dg| elements 
of Gal(L/K) which fix 8, we obtain [L : K] = |Gal(L/K)| = g-|Dg|. But we 
also have [L : K] = e(G/p)-f(8/@)-g, since all ramification indices and residual 
degrees of the 8;’s are the same. Hence |Dg| = e(8/)- f(8/g). In particular, 
if p is unramified in L, ie. if e(8/g) = 1, then |Dg| = f(8/p), which is 
the same as the cardinality of Gal((O_/2)/(Ox/@)). We obtain therefore the 
following crucial result: 


Theorem 9.B.38. Let L/K be a Galois extension of number fields and let 
be a prime of Ox unramified in Oz. Let Blo be a prime of OL over p. Then 
the map Dg —> Gal((OL/B)/(Ox/@)) is a bijection. In particular, there exists 
a unique (B,L/K) € Dg such that (B,L/K)(x) = 2X) for allx € OL. We 
call (B, L/K) the Frobenius substitution of B in L/K. 


Remark 9.B.39. Assume that we are only given ø. The choice of some £ over p 
yields a Frobenius substitution (8, L/K), but this depends on the choice of 2. 
However, any other prime above g is of the form a(() for some o € Gal(L/K) 
and it is immediate to check that (o(G),L/K) =00(8,L/K) 0a! (indeed, 
the right-hand side is in D,,g), which is trivially equal to cDga7', and satisfies 
the congruence that uniquely characterizes it). Hence, the conjugacy class of 
(8,L/K) in Gal(L/K) does not depend on the choice of 8 and is called the 
Frobenius conjugacy class of g. Note that if Gal(L/K) is abelian, then this 
conjugacy class is reduced to an element, which we denote (g, L/K): it is 
(8,L/K) for any 6 over p. 


We leave to the reader to check the following easy results: 


474 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proposition 9.B.40. Let M/L/K be a tower of Galois extensions of number 
fields and let p|G\o be a tower of prime ideals. Assume that p is unramified 
in M (thus B is unramified in M and p is unramified in L). Then we have 
the following relations. 


a) (p,M/L) = (p,M/K)/(8/), Actually, this relation still holds if M/K is 
not necessarily Galois. 


b) (p,M/K)|, = (8, L/K). 


9.B.9 Bauer’s theorem and Chebotarev’s density theorem 


Before stating the following fundamental result, we need one more defini- 
tion. If S is a set of primes, its Dirichlet density is 


d(S) = lim dupes PÀ | 
s—1 D 


if the limit exists. The next very deep theorem was conjectured (and proved 
in some special cases) by Frobenius. It is a vast generalization of Dirichlet’s 
theorem. 


Theorem 9.B.41. (Chebotarev) Let K be a finite Galois extension of Q of 
degree n and let g E€ Gal(K/Q). Let S be the set of primes p such that (9, K/Q) 
is conjugate to g for all prime divisors p of p. Then d(S) = 1 {hgh |h € GY. 
Here is a typical application of this deep theorem. 
Example 9.B.42. Let l be a prime and consider the set S of those primes p = 1 
(mod l) such that 2T =1 (mod p). These two conditions are equivalent 
to the statement that X! — 2 splits into distinct linear factors in F,[X] (as 
2T =1 (mod p) is equivalent to the existence of y € F, such that y! = 2). 
This is also equivalent to p being a product of different primes in Ox, where 
K = Q(eT , V2) is the splitting field of X! — 2. Using again Chebotarev’s 
theorem, we deduce that d(S) = KT: But K contains Qle T) and Q(V2), 
which have degrees l — 1, respectively l. Thus [K : Q] is a multiple of I(l — 1) 


and so necessarily |K : Q] = I(l — 1) and d(S) = mD: 


9.B. A Glimpse of Algebraic Number Theory 475 


We will also need the following easy consequence of Chebotarev’s density 
theorem: 


Proposition 9.B.43. Let o € Gal(L/K). There are infinitely many prime 
ideals p of degree 1 of K such that one can find B\po in L with (6,L/K) =o. 


Proof. Take a Galois extension E over Q which contains L. Since L/K is 
Galois, we can extend o to an automorphism again denoted o of Gal( E/K). 
By Chebotarev, we can find infinitely many primes p such that there is p|p 
in E with (p, E/Q) = o. Then D, C Gal(E/K), so p= pN K has degree 1 
(lemma 9.B.49). Let 8 = pN Oz. Then 


(8,L/K) = (p, E/K)|; = (p, E/Ql, = 0 
and we are done. q 


Definition 9.B.44. If K is a number field, let P,(K) be the set of rational 
primes p for which pOg has at least one prime ideal factor p of residual degree 
1 (i.e. such that Ox /g is the field with p elements). 


The following result explains why Pı(K) is an interesting object: 


Proposition 9.B.45. Let f € Z[X] be a monic irreducible polynomial, let 6 
be a root of f and let K = Q(@). Then there is c such that 


P,(K)[c, co) = {p > c|3x € Zsuch that p|f(x)}. 


Proof. If p is a sufficiently large prime, then the residual degrees of the prime 
divisors of pO are given by the degrees of the irreducible factors of f € F |X], 
by theorem 9.B.14. Hence, for such p we have p € P, (K) if and only if f has a 
linear factor in F,[X], i.e. if and only if there exists x € Z such that p| f(x). O 


Here is a rather nice application of this proposition: 


Theorem 9.B.46. (Nagell) If f € Z[X], let P(f) be the set of primes 
p for which the congruence f(x) = O (mod p) has at least one solution. 
If fi, f2,..., fk are nonconstant polynomials with integer coefficients, then 
P(fi) A P(fe)N---AP(f,) is infinite. 


476 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proof. The case k = 1 is a classical result due to Schur, but for the reader’s 
convenience let us recall the proof. If fı(0) = 0, everything is clear. Consider 
the numbers £n = ee LN We have |z,| > 1 and zn = 1 (mod n!) if n is 
large enough, so there exists pPn|£n with pn > n and the result follows. 

The general case is more difficult. We may assume that fi are irreducible 
monic. Let z; be roots of f; and let K; = Q(z;). Let K be the least number 
field containing all K;’s and write K = Q(z) for some x € Ox with minimal 
polynomial f. Applying the case k = 1 to f and using the previous proposition, 
we see that we can find infinitely many p which have a prime p of degree 1 in 
K. Let ĝi = pNOkK,. Since f(~/p) = 1, we also have f(8;/p) = 1. Applying 
once more theorem 9.B.22 (this time for each K;) we see that all but finitely 
many primes p (among the infinitely many we have just found) belong to 
M*_,P(fi), which is therefore infinite. E 


The following beautiful theorem due to Bauer was discovered before Cheb- 
otarev proved his theorem. It is however more conceptual nowadays to see 
Bauer’s theorem as a consequence of Chebotarev’s density theorem, which is 
what we will do. 


Theorem 9.B.47. (Bauer) Let Kı be a number field and let Kz be a number 
field which is Galois over Q. Suppose that there exists a set of primes S of 
Dirichlet density 0, with the following property: if p ¢ S is in Pi(Kı), then p 
is completely split in Kə. Then Ko C Kı. 


Proof. We start with two easy lemmas, which express the condition p € Pi(K) 
in a group-theoretic way. They are also results of independent interest and 
will be used in other applications. 


Lemma 9.B.48. Let L be a number field which is Galois over Q, with group 
G. Let QC KC L bea subfield and let H = Gal(L/K). Suppose that B\o\p 
is a chain of prime ideals in L,K,Q such that p is unramified in L. Then 


f(@/p) = 1 if and only if Dg C H. 


Proof. Note that the decomposition group of 8 with respect to the extension 
L/K is simply Da N H. But since p is unramified in L, g is unramified in 
L/K and so the decomposition group of 8 with respect to L/K has f(8/9) 


9.B. A Glimpse of Algebraic Number Theory A77 





. . B D 
elements. That is, |H N Dg| = f(G/g). But this equals also Hea = Toi 
The result follows. 


Lemma 9.B.49. Assume that L and K are as in the previous lemma and let 
p be a prime which is unramified in L. Then 


1) p has a prime factor p in K with f(g/p) = 1 if and only if there is B|p 
in L such that Dg C H. 


2) p is completely split in K if and only if for all B\p in L, we have Dg C H. 


Proof. 1) is an immediate consequence of the previous lemma. For 2), it is 
clear that if p is completely split in K, then for any |p we have Dg C H (as if 
p = BNO, then necessarily f(g/p) = 1). Conversely, if Dg C H for any |p, 
then the previous lemma implies that f(g/p) = 1 for any g|p in K. But note 
that p is unramified in K, as it is already in L. Thus p must be completely 
split in K. LJ 


Let us prove now the theorem. Choose!° a finite Galois extension L of 
Q containing Kı and K and let H; = Gal(L/K;). By the main theorem of 
Galois theory (theorem 9.B.32), it is enough to prove that Hı C H2. Choose 
any ø € Hı. By Chebotarev’s theorem, there is p € S such that p has a prime 
factor 8 in L with Dg = (a) (cyclic subgroup of Hı generated by a). By 
lemma 9.B.49, we have p € P,(K) and so p is totally split in K2, which forces, 
by the same lemma, Dg C H2. Hence o € Ho and we are done. O 


Remark 9.B.50. Here is another useful consequence of lemma 9.B.49. Let 
n > 1 be an integer and let H be a subgroup of (Z/nZ)*. We claim that a 
prime p not dividing n is totally split in Q(¢,)” if and only if p (mod n) € H. 
Indeed, such a prime is unramified in Q(¢,) and by lemma 9.B.49 it is totally 
split in Qn)” if and only if for any |p in Q(¢,) we have Dg C H. Since 
Gal(Q(¢n)/Q) is abelian, this is also equivalent to (p, Q(¢n)/Q) € H, which 
is saying that p (mod n) € H (we naturally identified H with a subgroup of 
Gal(Q(n)/Q)). 


'®For instance, if Kı = Q(x) and K2 = Q(y), consider the extension generated over Q by 
all conjugates of xz and y. 


478 Chapter 9. A Little Introduction to Algebraic Number Theory 


9.B.10 Finally, a reward: applications to 
“elementary-looking” problems 


In this section we show how the deep results in the previous sections 
can be combined to yield some fairly nontrivial theorems with “elementary” 
aspects. The first one uses a generalization of Bertrand’s postulate to obtain 
the following difficult irreducibility result. 


Theorem 9.B.51. (Schur) If n > 2 and aj,a2,...,@n—-1 are integers, then 
the polynomial 
xn xr! x2 


mol Gop tte a buat! 


is irreducible in Q. 


Proof. It is enough to prove that f = X” +nan-1X"7! +- -- +n! is irreducible 
in Z[X]. Suppose that this is not the case and let g be a monic irreducible 
factor of f of degree m, with m < 3. Choose any prime divisor p of n(n — 
1)---+--(m—m-+1) and consider the reduction f of mod p. The choice of p 
implies that X"-™*! divides f. If f = g - h, it follows that X"~-™+! divides 
g-h and since degh = n — m, we must have X|g and so p|g(0). 

Let z be a root of g and let K = Q|z], a number field of degree m. Since 
p|g(0), it follows that p divides N(z-O,) and so we can choose a prime ¢ of 
Ox dividing zOx and such that g|p. Let e = e(p/p) < [K : Q] = m be the 
ramification index of p. Since f(z) = 0, we obtain (set a, = 1) 


(nl) > mi ua 
v,(n!) > min vp | a; z- — 
P l<i<n ” ilj’ 


so we can find ż such that 


volz) ł 
e e 





Up(i!) > ivp(z) =i- . 
Using the inequalities vp(i!) < ral and e < m, we deduce that p < m. 

We have therefore proved that all prime factors of n(n—1)-----(n—m+1) 
are less than or equal to m. This contradicts a famous theorem of Sylvester 


9.B. A Glimpse of Algebraic Number Theory 479 


and Schur, which generalizes Bertrand’s postulate and for a proof of which we 
refer the reader to [26]. O 


Theorem 9.B.52. [fn > 2k, then (3) has a prime divisor greater than k. 
Let us consider now the second application. 


Theorem 9.B.53. (Davenport, Lewis, Schinzel) Let f € Q|X] be a polyno- 
mial with the following property: any arithmetic progression P C Z contains 
an integer x such that f(x) is the sum of squares of two rational numbers. 
Then there exist polynomials g,h € Q|X] such that f = g? + h?. 


Proof. We may assume that f is not constant. Using Gauss’ lemma, we can 
write f = cff- fer for some c € Q*, some primitive, distinct, irreducible 
polynomials f; € Z[X] and some e; € N*. Fix an index j for which e; is an 
odd number and let 6 be a root of fj. We claim that L = Q(0) contains Q(i). 
Using part a) of Bauer’s theorem 9.B.47, it is enough to prove that if q is 
a sufficiently large prime in P,(Q(@)), then q € Pı(Q(i)). We will need the 
following standard argument: 


Lemma 9.B.54. There exists a prime qo with the following property: for all 
q E€ Pi(L) A [q0, co), one can find an integer x such that vg(f;(z)) = 1 and 
vg f(t) =0 for all k # j. 


Proof. First, choose a prime qı > [Or : Z[0]]. Take a prime number qz greater 
than all prime factors of (the numerator or denominator of) c and such that 
q does not divide ged(fj(x), f;(Z) + I [kz; fe(z)) for any x € Z and any q > q. 
To prove the existence of q2, note that f; is relatively prime to f; . kz; fk, 
write a Bézout relation over Q for f; and f; -JI kj fk and clear denominators. 
We claim that go = max(qi,q2) works. Indeed, let q € Pi(L) A [qo, co]. By 
Dedekind-Kummer’s theorem 9.B.22, we can find y € Z such that q divides 
fj(y). Then q does not divide f;(y) or [].4; fe(y), so one of y or y +q satisfies 
the desired conditions. E 


Now, fix qo as in the previous lemma and let q € P;(L) A [qo,co) and 
z as in the lemma. By hypothesis we can find y = x (mod q?) such that 
f(y) = a? +b? for some rational numbers a,b. Note that v,(f;(y)) = 1 and 


A80 Chapter 9. A Little Introduction to Algebraic Number Theory 


Ug(fe(y)) = 0 for k # j. Then ej = vg(f(y)) = vala? + b?) is odd by our 
choice of y. We deduce that q = 1 (mod 4) and so q is split in Q(z), hence 
q € P;(Q(i)). This proves the claim made in the previous paragraph and we 
conclude that i = /—1 € L. Hence we can write i = h(@) for some h € Q[X]. 
Then f; must divide h? + 1. If G = gcd(h — i, fi) € Q(i)[X], it is easy to see 
that Neiyjg(G) = gcd(h — i, fj) - gcd(h + i, fj) is a polynomial with rational 
coefficients dividing fj. We deduce that fj is a constant times Ngy/Q(G), 
and this last polynomial is obviously the sum of squares of two polynomials 
with rational coefficients. 

Applying the previous arguments for each f; such that e; is odd, we 
deduce that f is of the form c(u? +v?) for a constant c and some u, v € Q|X]. 
But using once more the hypothesis on c we deduce that c is the sum of squares 
of two rational numbers and the result follows. E 


Remark 9.B.55. Actually, Davenport, Lewis and Schinzel prove a more general 
result: let K/Q be a Galois extension of degree n and let w,,v2,...,Un be 
such that Ox = Zvı + Zv +--- + Zvn. Suppose that f € Q[X] has the 
property that any arithmetic progression of integers contains an integer x for 
which the equation f(z) = Ng oluv + ugv2 +++: + UnUn) is solvable in 
(u1, U2,...,Un) E Q”. If either Gal(K/Q) is a cyclic group or the multiplicity 
of any zero of f is prime to n, then one can find u1, u2...., Un € Q[X] such 
that 
F(X) = Ngalu (X), uX), ---,un(X)): 


The proof follows precisely the same ideas, but is a bit more technical. 


Theorem 9.B.56. (Schinzel) Let f € Z|X] be an irreducible polynomial over 
Q and let n be an integer greater than 1. Suppose that there is c with the 
following property: if p > c divides f(x) for some z € Z, then p = 1 (mod n). 
Then there are a € Q and a polynomial g E Q(Cn)[X] such that 


F(X) =a: Ngenal (X). 


Proof. Let 0 be a root of f an let K = Q(0). We claim that Q(¢,) C K. By 
Bauer’s theorem, it is enough to prove that if p is a large prime having a factor 
of degree 1 in K, then p is completely split in Q(¢,). But if p is such a prime, 


9.B. A Glimpse of Algebraic Number Theory 481 


greater than c and n, then p divides some f(z), so p = 1 (mod n) and p is 
completely split in Q(¢,). 

Now, write Cn = u(9) for some u € Q|X]. Then, since f is irreducible and 
gn(u(9)) = 0, the polynomial f divides n o u in Q|X] (recall that ¢, is the 
nth cyclotomic polynomial, minimal polynomial of Cn over Q). For j relatively 
prime to n, define fj = gcd(f,u — Gr) E€ Q(Cn)[X]. Let oj € Gal(Q(¢n)/Q) be 
the automorphism sending Cn to ex Then clearly f; = 0;(f1), so 


LT] fi = Novenyolsfr). 


gcd(j,n)=1 


Clearly, the polynomials f; are pairwise relatively prime, so their product 
divides f. But, as we have seen, their product has rational coefficients, hence 
by irreducibility of f we must have f = a- Ngvc,)/g(fi) for some constant a. 
The result follows. LJ 


Remark 9.B.57. It is much easier to check that the converse of the theorem 
also holds. 


Theorem 9.B.58. (Murty) Let f € Z[X] be a polynomial and let I,n be 
relatively prime positive integers. Let S be the set of primes p for which the 
congruence f(x) = 0 (mod p) has solutions in Z. Suppose that there exists c 
such that any p E€ S greater than c satisfies p= 1 (mod n) or p = l (mod n). 
Also, suppose that there are infinitely many primes p = l (mod n) in S. Then 
l2=1 (mod n). 


Proof. Let 0 be a root of f and let K = Q(@) and L = K(¢,), so L/K isa 
finite Galois extension (splitting field of X” — 1). Let H be the subgroup of 
Gal(Q(¢,)/Q) generated by the automorphism o;, which sends Cn to Ç}. Note 
that the hypothesis combined with Bauer’s theorem and with remark 9.B.50 
yield an inclusion M C K, where M = Q(¢,,)". Here is the crucial technical 
result: 


Lemma 9.B.59. Restriction to Q(Gn) yields an isomorphism between 
Gal(L/K) and H. 


482 Chapter 9. A Little Introduction to Algebraic Number Theory 


Proof. Since M C K, the restriction of any ø € Gal(L/K) yields an element 
of H = Gal(Q(¢,)/M). It is clear that if the restriction of ø is trivial, then a 
is trivial, as L = K(C,). Surjectivity is more delicate. Choose a prime p= l 
(mod n) which is unramified in L (this excludes only finitely many primes) and 
for which K has a prime |p of degree 1. The existence of p follows from the 
second hypothesis. Let 8 be a prime of L lying over g and let p = BNQ(G,), a 
prime of Q(Cn) lying over p. Let ø = (8, L/K) be a Frobenius automorphism 
associated to 8. Since p has degree 1, we have a(x) = x? (mod £) for all 
xz € Oy. We claim that the restriction of ø to Q(¢,) is o, which will prove 
the surjectivity and end the proof of the lemma. But this restriction, call it 
T, is an element of Gal(Q(¢,)/Q) which preserves p and such that r(x) = x? 
(mod p) for all x € Z[Cn], thus it has to be (p, Q(¢n)/Q) = øp (it is easy to see 
that op satisfies the properties which uniquely characterize (p,Q(¢,)/Q)). It 
remains to observe that op = a), as p = l (mod n). E 


It is now easy to finish the proof of the theorem: consider oj2 = of € H. 
By lemma 9.B.59 there is ø € Gal(L/K) restricting to ø. By proposi- 
tion 9.B.43 we can find infinitely many primes g of degree 1 of K such that 
(8,L/K) =o for some |g. These primes p correspond therefore to primes 
p such that p = I? (mod n) and for which p|f(x) for some x € Z. The first 
hypothesis yields therefore 12 = 1 (mod n) or 1? = l (mod n). The result 
follows. O 


Using the tools developed in this addendum, we are also able to prove 
theorem 2.A.6. For the reader’s convenience, let us recall the statement. The 
proof is adapted from the beautiful paper [32], where a much more general 
result is proved. 


Theorem 9.B.60. Letn > 1 anda be integers such that a is an nth power 
modulo p for all sufficiently large prime numbers p. Then either a is the nth 
power of an integer or 8\n and a = 22b” for some integer b. 


Proof. We will use without comment the following well-known, but nontrivial 
result: if an integer a is a perfect square mod p for all sufficiently large primes 


9.B. A Glimpse of Algebraic Number Theory 483 


p, then a is a perfect square. While elementary proofs exist,!® this statement is 
also an immediate consequence of Chebotarev’s density theorem 9.B.41 applied 
to the splitting field of X? — a. 

So, if n is even, we already know that a is a perfect square, in particular 
positive (the case a = 0 is trivial and excluded in this proof). Hence, by 
replacing a by —a if n is odd, we may and will assume that a > 0. The crucial 
ingredient of the proof is the following application of Bauer’s theorem 9.B.47: 


Lemma 9.B.61. If z is a complex root of X” — a, then Q(z) C Q(Gn), where 
Qin 


Gn=enr. 


Proof. Let Kı = Q(Cn) and let Ko be the splitting field of X” — a over Q. 
It is enough to prove that Kə C Kı. Using Bauer’s theorem, it suffices to 
prove that for all sufficiently large primes p € P,(Kj), p is completely split 
in K2. If p does not divide n (which certainly happens if p is large enough), 
the condition p € P, (Kı) is equivalent to n|p — 1 by proposition 9.B.25. Also, 
if p is large enough, then a is an nth power modulo p. These two conditions 
imply that X” — a is a product of n distinct linear factors in F,[X]. Using the 
Dedekind-Kummer theorem 9.B.22,!’ we deduce that p is totally split in Ko, 
which finishes the proof of the lemma. LJ 


Coming back to the proof of the theorem, let d be the least positive divisor 
of n for which Vat € Q. Write Ya = Wc for some c € Q3. By lemma 9.19, 
the polynomial Xt — c is irreducible over Q and by lemma 9.B.61 we have 
Q( Wc) C Qn). Since Q(¢,)/Q is Galois with commutative Galois group, 
any subfield of Q(¢,) is again Galois over Q and so Q( ¢/c)/Q is Galois. In 
particular, Ça- 4c (which is a conjugate of ¥~/c) is again in Q( 4c) and so it is 
a real number. We deduce that Ca € R, which implies that d = 1 or d = 2. If 


16The Jacobi symbol (+) for odd m > 3, defined as the product over the prime factors of 
m with multiplicity of the corresponding Legendre symbols, detects quadratic nonresidues 
mod m and is easily seen to obey quadratic reciprocity. The Chinese Remainder Theorem 
then permits the construction of a “bad” value of m. 

'7To be fair, we need the obvious version of this theorem in which Q is replaced by K; 
and K by K2 = Ki( Ya). We leave to the reader to convince himself that the statement and 
proof of this general version are exactly the same. 


484 Chapter 9. A Little Introduction to Algebraic Number Theory 


d = 1, then a is the nth power of a rational number and we are done (as a is 
an integer, this rational number is actually an integer). 

Suppose now that d = 2 and write n = 2m so that a is an mth power. 
Note that a is also a perfect square by the discussion in the first paragraph. 
If m is odd, then a is an mth power and a perfect square, hence it is an 
n = 2mth power and we are done. Suppose that m is even, so 4|n. Then 
a is a 4th power modulo all sufficiently large primes and, by what we have 
already seen, we can write a = ¢?, with Q(ve) c Q(Cy) = Q(i). Since c > 0, 
we deduce that Q(/c) C RN Q(t) = Q, so c is a square and a is a fourth 
power. If m = 2k with k odd, this implies that a is a Icem(m,4) = nth 
power and we are done again. Finally, assume that n = 2 - u, with i > 3 
and u odd. Since a is an 2th power modulo all sufficiently large primes, 
lemma 9.B.61 implies that Q(vc) C Q(Çz). The extension Q(Çə:)/Q is Galois, 
with Galois group (Z/2‘Z)*. Its sub-extensions which are quadratic over Q 
are classified by the subgroups with two elements of (Z/2'Z)*, which in turn 
are classified by the nontrivial solutions to the equation z? = 1 (mod 2°). 
There are three such solutions, giving rise to the quadratic sub-extensions 
Q(/—1), Q( V2), Q(/—2). Since yc is real, we deduce that Q(/c) c Q(V2) 
and so c is either a perfect square or twice a perfect square. As a = c2, this 
finally yields the desired result. E 


Chapter 10 


Arithmetic Properties of 
Polynomials 


This chapter deals with elementary, but rather challenging problems con- 
cerning the arithmetic of polynomials with integer coefficients. The techniques 
are very diverse, going from the very basic fact that a—b divides f(a)— f(b) for 
f € Z[X| and a,b € Z to Mahler expansions, finite differences, p-adic analysis, 
finite fields, etc. 


10.1 The a—)d|f(a) — f(b) trick 


One can extract a lot of arithmetic information from the fact that a — b 
divides f(a) — f(b) for all polynomials with integer coefficients f, but often 
one needs a lot of inventiveness to do this. The next problems are all quite 
tricky, even though they hide very simple ideas. The first problem is a version 
of a very classical result, stating that there are no nonconstant polynomials 
all of whose values are prime numbers. 


l. Let fi, f2,..., fk be nonconstant polynomials with integer coefficients. 
Prove that for infinitely many n all numbers fi(n), fo(n),..., f(n) are 
composite. 


486 Chapter 10. Arithmetic Properties of Polynomials 


Proof. Let g = fifo----- fk and fix a positive integer ng. Since none of the 
polynomials g and f; is constant, there are a,b > no such that 


min(|9(@)|,|fi(@)|,---.1fe(@)]) > 1 and |fi(a + |g(a)ļb)| > [9(a)| 


for all i. Since | f;(a)| divides f;(a+|g(a)|b) and is less than | f;(a+|g(a)|b)|, it 
follows that all numbers f;(a+|g(a)|b) are composite. The result follows. O 


The following two problems deal with periodic points of polynomials with 
integer coefficients. 


2. Let f € Z[X] and let n > 3. Prove that one cannot find distinct integers 
T1, T2,..., Zn such that f(z;) = zi-1, i = 1,2,...,n, indices being taken 
mod n. 


Proof. If such a polynomial f and integers z; existed, 

Ti — Ti—1 = f (igi) — f (zi) 
would be divisible by £i+ı — z;. In particular, |z; — zi-1| > |£i+ı — zıl for all 
i, which forces the numbers |x; — x;—1| to be equal. Since 


n 


X (zii — x) = 0, 


i=1 


there exists 7 such that 234; — zi and Zj42 — 7j4; have different signs. But 
then x; = 2j42, which contradicts the hypothesis that zx; are all distinct. The 
result follows. LJ 


3. Let f € Z[X] be a polynomial of degree n > 2. Prove that the polynomial 
f(f(X)) — X has at most n integer zeros. 


Gh. Eckstein, Romanian TST 


Proof. Suppose that zı < T2 < -+: < XYn4 are distinct integers such that 
f(f(xi)) = xi. Then f(z;) are also pairwise distinct and f(z;) — f(z;) is a 


10.1. Thea —b|f(a) — f(b) trick 487 


multiple of z; — z;. But x; — x; = f(f(zi)) — f(f(z;)) is also a multiple of 
f(zi) — f(z;). Thus, we must have |f(zi) — f(z;)| = |v; — z;|. But then 


NO | f(aiz1) — f(z) = X lti — z: 
i=l i=l 


= In+1 — T1 


= |f(£n41) — f(21)| 


n 


S (F(z) — f(zi)) 


i=1 


? 








which implies that all numbers f(zi+1)— f(z;) have the same sign. Combined 
with the equality |f(z;) — f(zi41)| = |Z; — 2i41| this shows that either all 
numbers f(x;) + x; are equal or that f(z;) — 2; are all equal. This is however 
impossible, as f has degree n. E 


Remark 10.1. A very similar problem was given at the IMO 2006: let P(X) be 
a polynomial of degree n > 1 with integer coefficients and let k be a positive 
integer. Consider the polynomial 


k times 


Prove that there are at most n integers such that Q(t) = t. It follows from 
problem 2 that any integral solution of the equation Q(t) = t is a solution of 
the equation P(P(t)) = t, so this new problem is actually equivalent to the 
previously discussed one. 


The following two problems are a bit trickier than the previous ones. 


4. Find all integers n > 1 for which there is a polynomial f € Z[X] with 
the property: for any integer k one has f(k) = 0 (mod n) or f(k)=1 
(mod n) and both these equations have solutions. 


Proof. The answer is: exactly the powers of a prime number. If n is a prime 
power, one can see from Euler’s theorem that the polynomial X¥(”) is a solu- 
tion. Conversely, suppose there exists a polynomial f with the properties 


488 Chapter 10. Arithmetic Properties of Polynomials 


in the statement and that n has at least two different prime factors p,q. 
Changing f to f(X — a) for a suitable a, we may assume that f(0) = 0 
(mod n). Thus f(0) = 0 (mod q), which implies that for all b € Z we also 
have f(bq) = 0 (mod q). Thus, we cannot have f(bq) = 1 (mod n) and so 
f(bq) = 0 (mod n). In particular, f(bg) = 0 (mod p). But then for all a € Z 
we also have f(ap + bq) = f(bq) = 0 (mod p) and so f(ap+ bq) = 0 (mod n). 
Since any integer x € Z can be written ap + bq for suitable a,b, it follows that 
the equation f(z) = 1 (mod n) has no solutions, a contradiction. LJ 


5. Find all polynomials f with integer coefficients such that f(n)|2” — 1 for 
all positive integers n. 


Polish Olympiad 


Proof. Of course, if f is constant, then f has to be either 1 or —1 and these are 
solutions. So, assume that f is nonconstant. We may assume that the leading 
coefficient of f is positive. Take n such that f(n) > 1 and let p be a prime 
factor of f(n). Then p|2” — 1 and also p|f(n + p)|2"T? — 1. But then p|2? — 1, 
which is certainly impossible. So any such f is constant and the solutions are 
1,-1. O 


6. Let f € Z[X] be a nonconstant polynomial. Prove that the sequence 
f(3") (mod n) is not bounded. 


Proof. Changing f with its opposite, we may assume that the leading co- 
efficient of f is positive. So, given N, there exists m such that f(3™) > N. 
Choose a prime p > 2f(3™) and observe that f(3P™) = f(3™) (mod p) by Fer- 
mat’s little theorem. In particular, if r = f(3™P) (mod mp), then r = f(3™) 
(mod p) and so r > f(3™) > N, finishing the proof. E 


7. Is there a nonconstant polynomial f € Z[X] and an integer a > 1 such 
that the numbers f(a), f(a?), f(a), ... are pairwise relatively prime? 

Saint Petersburg 1998 

Proof. Assume that f is such a polynomial and a is as in the statement. 


Let g = gcd(a, f(a*)) = gcd(a, f(0)). If g > 1, then g is a common fac- 
tor of all f(a’). Thus g = 1, so a and f(a’) are relatively prime for all 


10.1. The a-— b|f(a)— f(b) trick 489 


i. Choose an i with |f(a’)| > 1 and choose 7 = i + y(|f(a’)|), where y 
is Euler’s totient function. Then f(a’) divides af — a’ by Euler’s Theorem, 
so f(a‘) divides f(a’) — f(a’) and f(a‘)|f(a’). But this contradicts the fact 
that ged( f(a’), f(a7)) = 1 and shows that the answer to the problem is nega- 
tive. O 


The next problem is a variation on the following very classical (but non- 
trivial) problem: what are the polynomials f € Z[X] such that f(p) is a prime 
for all prime numbers p? 


8. Find all polynomials f with integer coefficients, having the following 
property: there exists k such that for all primes p, f(p) has at most k 
prime factors. 


Proof. All polynomials of the form aX™ work, for obvious reasons. Let f be 
a solution of the problem and suppose f is not of this form. So we can write 
f(X) = X™g(X) for some polynomial g with integer coefficients such that 
g(0) # 0 and g is not constant. By Schur’s theorem, there are infinitely many 
primes dividing at least one of the numbers g(1), g(2),.... In particular, we can 
choose p1, p2,...,Pk+1 distinct primes greater than |g(0)| and we can choose 
positive integers a1, a2,...,@ķ+1 such that g(a;) = 0 (mod p;). Using the Chi- 
nese Remainder Theorem, we can find an integer a such that a = a; (mod p;) 
for all 7. Note that a is relatively prime to p)po--- pps, (otherwise some p; 
would divide a, so p;ļa; and then p;|g(0), a contradiction). Thus, by Dirich- 
let’s theorem, there exist infinitely many primes p = a (mod pip2--- p41). In 
particular, we can choose such a prime p with f(p) # 0. Since by construction 
f(p) is a multiple of pjpo---pr4i, it follows that f is not a solution of the 
problem. The result follows. LJ 


Proof. Note that all polynomials of the form f(X) = aX™ work. Suppose 
f(X) = X™g(X) is such a polynomial which is not of this form. Then g(X) 
is also such a polynomial. Thus we may assume that f is nonconstant and 
f(0) #0. Let p be a prime not dividing f(0), hence p does not divide f(p). 
By Dirichlet’s Theorem there are infinitely many primes q of the form q = 
p + kf(p)?. Choose such a q with |f(q)| > |f(p)|. For such a q we have 


490 Chapter 10. Arithmetic Properties of Polynomials 


f(q) = f(p) (mod f(p)*). Thus f(q) is divisible by every prime factor of f(p), 
with exactly the same multiplicities, and at least one additional prime factor. 
Iterating this construction, we can find a sequence of primes (pn) for which 
f (pn) has at least n prime factors. Thus the only solutions are the ones already 
given. E 


9. Find all polynomials f € Z[X] with the property that for any relatively 
prime integers m,n, the numbers f(m), f(n) are also relatively prime. 


Iranian TST 


Proof. The solutions are the polynomials +X* for d > 0. Trivially, these are 
solutions of the problem. Consider now any nonconstant solution f. Suppose 
that p is a large prime for which p does not divide f (p). Then p and p+ f (p) are 
relatively prime and thus f(p) and f(p+ f(p)) are relatively prime. However, 
f(p) divides f(p+ f(p)). Thus necessarily | f(p)| = 1 or f(p) = 0. Since p was 
large, none of this happens (f is nonconstant). Thus, for p large enough we 
have p|f(p), forcing p|f(0). This implies that f(0) = 0 and so we can write 
f(X) = Xg(X) for some polynomial g with integral coefficients. Of course, g 
satisfies the same property as f and has smaller degree. By an easy induction 
on the degree of f, we deduce that f is indeed of the form +X¢% for some d. O 


Proof. We prove first the following easy lemma: 


Lemma 10.2. If f € Z[X] is not of the form +X", then there exist different 
primes p,q such that q| f (p). 


Proof. Assuming the contrary, we may assume that the leading coefficient of 
f is positive. So, for all large enough p, f(p) = p“? for some integer kp. Since 
kp — log f (p) 


-ozp converges to deg(f), it follows that kp = deg(f) for all sufficiently 


large p and so f(p) = p380) for all sufficiently large p. The result follows. O 


Coming back to the proof, choose a solution f of the problem and suppose 
that f is not of the form +X”. Pick primes p,q as in the lemma. Then q divides 
f(p) and f(p + q) (since q divides f(p +q) — f(p)). But this is impossible, 
since p.p+q are relatively prime and so f(p) and f(p+q) are relatively prime. 
The result follows. E 


10.1. The a -— b|f(a)— f(b) trick 491 


It is a classical fact that for any integer a and any positive integers m and 
8 
n we have 
gcd(a™ — 1,a” — 1) = agcdlmn) _ 1, 


A sequence (an)n with the property that gcd(am,an) = aged(m,n) is called 
a Mersenne sequence. Thus (a” — 1), is a Mersenne sequence. In chapter 
3, problem 3 we proved that for any Mersenne sequence a, there exists a 
sequence of integers bn such that an = Lain ba. The next problem shows 
how to construct a Mersenne sequence by iterating a polynomial with integer 
coefficients. 


10. Let f be a polynomial with integer coefficients and let ag = 0 and 
An = f(@n-1) 


for all n > 1. Prove that (an)n>o is a Mersenne sequence. 
Romanian TST 


Proof. Let g = ff, the composite of f with itself d times. If m = du and 
n = dv, with gcd(u,v) = 1, are any positive integers, note that a, = g”(0), 
am = g"(0) and ag = g(0). So, it is enough to prove that for any g with 
integer coefficients and any gcd(u,v) = 1 we have gcd(g"(0), g”(0)) = g(0). 
Let us remark that for any polynomial h with integer coefficients and any k 
we have h(0)|h*(0). This is obvious. Applying this to g already shows that 
g(0) divides gcd(g"(0), g” (0)). Conversely, let x = gcd(g”(0), g” (0)). Applying 
the previous remark to h = g” and h = g”, we obtain that for all A,B > 1 we 
have z|g“%(0) and z|g®”(0). Taking A, B such that Bv = Au + 1, we obtain 
that x|g(g4%(0)) and z|g4%(0). It is then clear that x divides g(0) and the 
conclusion follows. E 


11. Find all positive integers k such that if a polynomial with integer coef- 
ficients f satisfies 


0 < f (0), f(1),.--, f(k +1) < k, 


then f(0) = f(1) =--- = f(k) = f(k + 1). 
IMO 1997 Shortlist 


492 Chapter 10. Arithmetic Properties of Polynomials 


Proof. If k < 2 we can easily find counterexamples, for instance f(X) = 
X(2-—X) for k = 1 and f(X) = X(3— X) for k = 2. So, assume that k > 3 
and let f be a polynomial with integral coefficients such that 


0 < f(0), f(1),..-.f(R +1) <k. 


As f(k +1) — f (0) is between —k and k and since it is a multiple of k + 1, we 
must have f(0) = f(k +1). Thus, we can write 


f(X) — f(0) = X(X -= (k + 1))g(X) 


for some polynomial g, which clearly has integer coefficients (note that X 
divides f(X) — f(0) and X is relatively prime to X — (k + 1)). 

Thus, f(0) + i(i — k — 1)g(i) is between 0 and k for allO <i < k +1. 
In particular, we have k > i(k + 1 — i)|g(i)| for i in this range. However, 
i(k+1-—i%) >k for2<i<k-1, so that g(t) = 0 for2<i<k-—-1. In 
particular, we can write 


g(X) = (X — 2)---(X -k + 1)h(X) 
for some polynomial h, again with integer coefficients. But then 


f(k) — f(0) = —k(k — 2)!h(k),  f(1) — FO) = (- 1)" *k(& — 2)!h(1). 


This implies that k(k — 2)!|h(x)| < k for x € {1, k}, which clearly implies that 
h(1) = h(k) = 0, unless k = 3. Therefore, unless k = 3 we can definitely 
conclude that f(0) = f(1) =--- = f(k +1) and so all k > 4 are solutions 
of the problem. It remains to study the case k = 3. But we note that we 
can actually have equality in all previous inequalities, yielding without much 
difficulty the polynomial f(X) = X(X — 2)?(4— X). This proves that k = 3 
is not a solution and the problem is solved. E 


Fermat’s little theorem implies that p divides 2? — 2 for any prime p. The 
following question is then rather natural to ask, but not easy to answer. 


12. Find all polynomials f with integer coefficients such that f(p)|2? — 2 for 
any odd prime number p. 


Gabriel Dospinescu, Peter Scholze 


10.1. Thea—Obd|f(a) — f(b) trick 493 


Proof. We will use the fact that f(p) divides f(p+kf(p)) for any k and choose 
k such that p+ kf(p) is again a prime. Of course, this is not possible unless 
p is relatively prime to f(p). 

Since f(X) = X is a solution (by Fermat’s little theorem), a reasonable 
first step will be to prove that any solution satisfies f(0) = 0. This is the 
hardest step. Assume for a moment that we proved that any nonconstant 
solution satisfies f(0) = 0, then we can write f(X) = Xg(X) and of course g 
satisfies the same conditions. So either g is constant or it is a multiple of X. 
Continuing like this we obtain that f(X) = +2°X° for some a,b > 0. Clearly 
a <1 and by taking p = 3 we obtain that b < 1. Thus, all solutions are of the 
form f(X) = +2°X° with a,b < 1 (these are clearly solutions of the problem). 


Now, let us prove that if f is nonconstant, then f(0) = 0. Assume that 
f(0) # 0 and, by changing f with —f, that the leading coefficient of f is 
positive. Then for large primes p, p is relatively prime to f(p) (as if p divides 
f(p), then p divides f(0)). By Dirichlet’s theorem for such a prime p we 
can choose infinitely many k > 1 (depending on p, of course) such that q = 
p+kf(p) is again a prime. Then f(p) divides 2? — 2 and also 2?+*/() — 2. But 
then f(p) also divides 2(28°4(P-1P-1+kf(P)) — 1). As f(p) = f(1) (mod p — 1), 
we deduce that f(p) divides 2(28°¢?-!4/()) — 1). Now, if f(1) = 0, all this 
work will accomplish nothing, but the good news is that f(1) cannot be 0: 
otherwise, f(p) would be a multiple of p — 1 and so p — 1 would divide 2? — 2 
for any prime p. This is of course false (take p = 5). Thus f(1) 4 0. However, 
there is a big obstacle in the previous approach: k depends on p and we have 
absolutely no control on it. The crucial remark is that we can however control 
gcd(p — 1,kf(1)) by choosing more carefully k. This is the object of the next 
paragraph. 

We will choose k of the form k = 2+ r(p — 1) for suitable integers r. In 
order to do that, we need to ensure that p+ f(p)(2+1r(p—1)) isa prime. This 
is realized for infinitely many r if p + 2f (p) is relatively prime to (p — 1)f(p). 
Now, if l is a prime factor of p+ 2f (p) that divides (p—1)f(p), then we cannot 
have l| f(p) (as otherwise | divides p and so l = p divides f(p), a contradiction), 
so | divides p — 1 and also 1+ 2f(1) (since f(p) = f(1) (mod p — 1)). But we 
can now do the following: choose from the beginning large primes p such that 
p—1 is relatively prime to 1+2f(1), for instance primes p = 2 (mod 1+2/f(1)). 


494 Chapter 10. Arithmetic Properties of Polynomials 


There are infinitely many such primes by Dirichlet’s theorem. For such p, we 
saw that p + 2f(p) is relatively prime to (p — 1) f (p), so we can indeed choose 
k of the form 2 + r(p — 1) such that q = p+kf(p) is a prime. The previous 
paragraph shows that f(p) divides 2(28°¢(?—!*f()) —1). But ged(k, p—1) = 2, 
so that f(p) actually divides 27/0) — 1. But since f(p) can take arbitrarily 
large values and since f(1) Æ 0, this is a (so desired) contradiction. This shows 
that f(0) = 0 and then the problem is solved by the first paragraph. LJ 


10.2 Derivatives and p-adic Taylor expansions 


Most of the time when working with congruences mod p it is enough to 
know that f(a) = f(b) (mod p) whenever a = b (mod p). However, when 
dealing with congruences with prime power moduli, one is sometimes obliged 
to use more precise estimates. Assume that f € Z[X] has degree n and 
consider its Taylor expansion 


n p(k) 
fle th) = DL 
k=0 


ae . (k) . . 
which is valid for any complex numbers zx, h. Note that ij ©) is an integer. 


Indeed, by linearity it suffices to prove this when f(X) = X” and in this case 








It follows easily from this that we have an equality in Z[X] (thus in R[X] for 
any commutative ring R) 





n ff) 
f(x) = Mx -yy 
k=0 


In particular, we have the very useful congruence f(a +b) = f(a) + bf’(a) 
(mod b?) whenever f € Z[X] and a,b € Z. We give a few applications of these 
ideas in this section. 


10.2. Derivatives and p-adic Taylor expansions 495 


13. Let p be a prime and let f € Z[X] be a polynomial. If f(0), f(1),..., 
f(p? — 1) give distinct remainders when divided by p?, prove that the 
numbers f(0),f(1),...,f (p? — 1) give distinct remainders when divided 


by př. 
Putnam 2008 


Proof. Assume that f(i) = f(j) (mod p?) for some i,j. Since f(i) = f(j) 
(mod p?) and since f is injective mod p°, we deduce that i = j (mod p?), say 
j =i+p’k. It is enough to prove that k = 0 (mod p). Assume that this is 
not the case. We have 


f(t) = f(9) = fli + kp’) = fi) + kp’ f'(i) (mod p’), 
so p divides k f'(i), hence p divides f'(i). But then 
fli + kp) = f(i)+kpf'(i) = f(i) (mod p°), 


which, combined with the hypothesis, yields i + kp = i (mod p?), a contradic- 
tion. Thus k = 0 (mod p) and i = j (mod pł). The result follows. O 


14. Let P be a polynomial with integer coefficients such that P(0) = 0 and 
gcd(P(0), P(1), P(2),...) =1. 
Show that there are infinitely many n such that 
gcd(P(n) — P(0), P(n + 1) — P(1), P(n + 2) — P(2),...) =n. 
USA TST 2010 
Proof. Let us try to study first 
dn = gcd(P(n) — P(0), P(n+1) — P(1),...) 


for any polynomial P with integer coefficients. Let q be a prime factor of dn, 
so that P(n+k) = P(k) (mod q) for all k, i.e. P is n-periodic modulo q. But 
P is also q-periodic modulo q. Thus, if gcd(q,n) = 1, then P is 1-periodic 


496 Chapter 10. Arithmetic Properties of Polynomials 


modulo q (by Bézout’s lemma) and so q divides P(n + 1) — P(n) for all n. 
Then q divides P(n) — P(0) for all n, so if P(O) = 0, then q must divide 
gcd(P(0), P(1),...). In particular, for our polynomial we must have q|n for 
any prime factor q of dn. 

The previous paragraph suggests taking for n a power of a prime, say 
n = p. Then we saw that dn is also a power of p. Note that dn is a multiple 
of n, since n divides P(n + k)— P(k) for all k. It remains to see if we can have 
p\+1|P(k +p) — P(k) for all k. Since 


P(k +p’) =P(k) +p’ P(k) (mod p%*?), 


this would imply that p divides P’(k) for all k. Now we see how to choose our 
numbers n: pick and fix once and for all a value k such that P’(k) Æ 0. For 
all sufficiently large p, p does not divide P’(k). For any such p, the previous 
arguments show that dp = n for all n = p™. The conclusion follows. LJ 


Before tackling the next problem, we need to recall a very standard result, 
known as Lagrange’s theorem. Consider f € Z[X] and a prime p. Let f € 
F |X] be the reduction of f mod p and assume that f #0. As F, is a field, 
it follows that f has at most deg f < deg f distinct roots in Fp. Therefore, 
if f 4 0 (which is equivalent to the fact that at least one coefficient of f is 
not a multiple of p), then the congruence f(z) = 0 (mod p) has at most deg f 
pairwise distinct solutions. The next result is a generalization of this classical 
theorem; it will also be used in the solution of problem 20. This was one of 
the problems proposed in the Iranian TST 2011, but the result is much older, 
see for instance theorem 27, chapter II of the beautiful book [21]. 


15. Let p be a prime, k a positive integer and f € Z[X] such that p! di- 
vides f(x) for all x € Z. If k < p, prove that there are polynomials 
Jo. 91, ---, gk E Z[X] such that 


k 
F(X) = X p(X? — XÙ - gi(X). 


i=0 


Proof. The proof is by induction on k. If k = 1, perform the division algorithm 
in Z[X] for the polynomials f and X? — X (which we can do, as X?’ — X is 


10.3. Hilbert polynomials and Mahler expansions 497 


monic) to find q,r € Z[X] such that f(X) = (XP — X)q(X) + r(X) and 
degr < p. Then p divides r(x) for all integers x (by Fermat’s little theorem 
and the hypothesis) and the result follows from Lagrange’s theorem. Assume 
that the result holds for k and that k+1 < p. If p**! divides f(x) for all z, then 
by the inductive hypothesis we can write f(X) = SE o p*—*(XP — XY -gi(X) 
for some gi € Z[X]. Pick any integers x and z and write x? — x = py for some 
integer y. Then (x + pz)? — (x + pz) = p(y — z) (mod p°), thus 


k k 
f(a+pz) = X p" (y - z}gi(£ + pz) =p" X_ (y - z)'gi(z) (mod p**?). 
i=0 i=0 
Therefore the hypothesis on f implies that p divides Sho z'g;(x) for all in- 
tegers z. Using the fact that k + 1 < p and Lagrange’s theorem, it follows 
that p divides g;(x) for all i and all z. By the case k = 1 we can write 
gi(X) = (XP — X)hi(X) + pri(X) for some h;,r; € Z[X]. Replacing these ex- 
pressions in f(X) = an p*-*(XP — XY - gi(X) yields the desired result. C 


10.3 Hilbert polynomials and Mahler expansions 


Consider the polynomial 


’ 


(*) L XX- D(X- (Kn) 


known as the nth Hilbert polynomial. It is clear that this polynomial has 
rational (not necessarily integer) coefficients and it is not difficult to check that 
it takes integer values at all integers (note that z(z—1)---(r -n+1) = n!(7) 
if x > 1, while 


— -1 
x(x —1)---(rx-n+1)= (-1)"( uve ) 
if x < 0). These polynomials play a fundamental role in the theory of integer- 
valued polynomials because of the following beautiful and classical result of 
Polya: 


498 Chapter 10. Arithmetic Properties of Polynomials 


Theorem 10.3. Let f be a polynomial of degree n, with real coefficients. Then 
f(Z) C Z if and only if there exist integers ap, a1, a2,...,an such that 


F(X) = a9 +a (7) +ax(9) tetan(7) 


Proof. One implication follows from the previous discussion, so assume that 
f(Z) c Z. The polynomials (à ), (Ž).--., (Ž) have degrees 0,1,...,n, thus 
they form a basis of the R-vector space of real polynomials with degree at 


most n. Thus there exist unique real numbers ao, a1,...,@n, such that 
n 
X 
X)= Ak ` . 
f(x) 3 k ( R) 


Consider the operator Af (X) = f(X +1) — f(X) and observe that 


aa) = (01) 


Applying A successively to the relation 


x)= a (X); 


k=0 
we deduce that a; = A’f(0) for all j. On the other hand, an immediate 
induction shows that 


k 


k _ k— k 
At fx) = S1) KOLET) 


j=0 


Thus, if f(0), f(1),..., f(n) are integers, then so are ag, @1,...,@n. The result 
follows. O 


The proof of the previous theorem shows that 


ak = Soye: (Ero 


j=0 


10.3. Hilbert polynomials and Mahler expansions 499 


We call a; the Mahler coefficients of f and we refer the reader to the addendum 
3.B for more details on these coefficients, which play an absolutely fundamental 
role in p-adic analysis. Another consequence of the proof of theorem 10.3 is 
the following very useful result. 


Proposition 10.4. Jf f is a polynomial of degree n with coefficients in an 
arbitrary commutative ring, then 


n+l 


So(-1ytt* (" ‘sex +k) =0. 


k=0 


Proof. The left-hand side is A”+! f(X). However, we have deg Ag < deg g for 
any nonzero polynomial g. We deduce that A"t!f is the zero polynomial and 
the result follows. LJ 


The next problem is an immediate application of theorem 10.3. The reader 
might try to solve it in a different way and he will realize that the problem is 
actually quite tricky. 


16. Let n be a positive integer. What is the least degree of a monic polyno- 
mial f with integer coefficients such that n|f(k) for any integer k? 


Proof. Let d = deg f. By theorem 10.3 we can write HX) = yan Ak (+) for 
some integers a;. Considering the leading coefficients in this equality shows 
that d! is a multiple of n. On the other hand, if d! is a multiple of n, then 
we can take f(X) = X(X + 1)---(X +d-— 1). Thus the answer is: d is the 
smallest integer such that d! is a multiple of n. E 


17. Let f be a polynomial such that f(n) € Z for all n € Z. Prove that 


for any integers m,n the number Icm[1,2,...,deg(f)] - Km) fn) is an 
integer. 
MOSP 2001 


Proof. Fix m,n distinct integers and let d = m-n and g(X) = f(n+X). Then 
g has rational coefficients, sends integers to integers and deg(f) = deg(g). So, 


500 Chapter 10. Arithmetic Properties of Polynomials 


we need to prove that 
d) — g(0 
Icm[{1,2,...,deg(g)] - to 9) 


is an integer. Let D = degg, so, using theorem 10.3, there exist integers 


ao, @1,...,@p such that 
2 /X 
X) = i . 
g(X) «(7 ) 


i=0 
It is thus enough to prove that for any 1 < i < D we have 
l/d 
lem[1,2,...,D]--{ .] €Z. 
cm| 7 (c) 
But the left-hand side is equal to lem{1,2,.,D] (i), which is clearly an integer. 


i-1 
The result follows. g 


18. Let f be a polynomial of degree d such that f(Z) C Z and for which 
f(m) — f(n) is a multiple of m — n for all 0 < m,n < d. Prove that 
f(m) — f(n) is a multiple of m — n for all integers m,n with m # n. 


Holden Lee 


Proof. Since f (Z) C Z, theorem 10.3 yields the existence of integers ao, a1,- .-, 


aq such that 
t /X 
f(X) = Safi) 


k=0 
The result follows now from the previous problem and the following general 
result. O 
Lemma 10.5. Let a; € Z and let 


d 


f(X) = a5, | and Ly = Iem(1,2,...,k) 


k=0 


(by convention Lo = 1). Then the following assertions are equivalent: 


10.3. Hilbert polynomials and Mahler expansions o01 


a) m—n divides f(m) — f(n) forallO<mAn<d=degf. 
b) Ly divides ay for allO < k <d. 
c) m—n divides f(m) — f(n) for alm Ane zZ. 


Proof. Suppose that a) holds. We will prove by induction on 7 that L; di- 
vides a;. This is clear for 1 = 0, so assume that ag,...,a;_1 are multiples of 
Lo, Li,..., Li-1 and fix 0 < j < i. Then 7 — i divides 


OEO OHO) 


By the previous problem and the inductive hypothesis, each of the numbers 
ak (6) — (i) with 0 < k <i is a multiple of i — j. We deduce that i — 7 
divides a; and since j < i was arbitrary, it follows that L; divides a;. Hence 
a) implies b). The previous problem shows that b) implies c) and since it is 
trivial that c) implies a), the result follows. We are done. O 


Remark 10.6. The fact that the polynomial f(X) = Xio ak ` ($ ) satisfies 
fa) -fo € Z for alla 4b € Z if and only if lem(1, 2,...,k)]a, for all k seems 


a— 
to have been first noticed by R.R.Hall and I.Z.Ruzsa. 

The following problem strongly suggests using Lagrange’s interpolation 
theorem, but there are some difficulties in making the argument work, since the 
polynomials appearing in Lagrange’s formula don’t have integer coefficients. 
The problem is quite tricky and makes again use of Hilbert’s polynomials. 


19. a) Prove that for all positive integers n there is a polynomial f € Z[X] 
such that all numbers f(1) < f(2) <---< f(n) are powers of 2. 
b) Let a > 1 be an integer and let n be a positive integer. Prove that 
there exists a polynomial f of degree n, having integer coefficients, 
such that f(0), f(1),..., f(n) are pairwise distinct positive integers, 
all of the form 2a* + 3 for some integer k. 


Chinese TST 2004 


502 Chapter 10. Arithmetic Properties of Polynomials 


Proof. a) We will choose f of the form 


x= D (7) 6 


for some suitable integers A, B, to be chosen later. By the binomial formula, 
for any 0 <i < n we have f(i) = A(1+ B}. Now, of course we want f to have 
integral coefficients and unfortunately (“ ) does not have integral coefficients. 
However, ni(* ) has integral coefficients for all 1 <i < n. Note however that 
we cannot simply take for B a multiple of n!, since then A(1+B)’ has no chance 
of being a power of 2. However, we can profit from the presence of A: take A 
to be 2¥2("') and B an odd multiple of the greatest odd divisor of n!. Then the 
previous remarks show that f has integral coefficients. Finally, we want 1 + B 
to be a power of 2. Thus, we can choose (for example) 1+ B = 2(4) where d 
is the largest odd divisor of n!. With these choices, f(1) < f(2) <---< f(n) 
are powers of 2, as needed. 

b) This uses the same idea as a). Write n! = m-q, where all prime factors 
of m are among those of a and where gcd(q,a) = 1. Let b = a? — 1, so q 
divides b. Finally, define 


m . X 2 
f(X) =2a (7) +3. 
1=0 
It has integer coefficients because i!|n!|a™ -b for all 0 < i < n. Moreover, for 
1<k<nwe have 
P(i) = 2a™- (b + 1} +3 = 2a (Mt 4 3, o 


Remark 10.7. It is also true that for all positive integers n there is a polynomial 
f € Z[X] such that all numbers f(1) < f(2) <--- < f(n) are prime numbers. 
We will try to find such a polynomial of the form 


f(X) = a(X — 2)(X —3)---(X - n) +a(X - 1)(X -= 3)--- (X - n) 
+++) +an(X -1)--(X-n+1)+l1, 


10.3. Hilbert polynomials and Mahler expansions 003 


for some suitable integers a;. The reason is that f(z) only depends on a;, so 
that it will be rather easy to adjust a; in order to make f(z) prime. Indeed, 
note that 

f(t) = (-1)""*(n — i)!(i — 1)!a; + 1. 


Now, we will choose the a,;’s inductively so that f(1) <--- < f(n) are primes. 
We will heavily use theorem 9.6, according to which for all n there are infinitely 
many primes of the form 1 + kn. Thus, there exists an integer a; such that 
1+ (-1)""1(n — 1)!a; = f(1) is a prime. Fix such a; and choose (again by 
the cited result) an integer aj such that 1 + a(n — 2)!(-1)"-* = f(2) isa 
prime greater than f(1). Continuing like this, we find a1, a2,...,an such that 
f(1) < f(2) <---< f(n) are primes and the problem is solved. 


20. Suppose that n is a positive integer not divisible by the cube of a prime 
number. Consider all sequences (21, £2,..., Zn) with x; E€ Z/nZ. For 
how many of these can we find a polynomial f with integer coefficients 
such that f(z) (mod n) = z; for all 2? 


USA TST 2008 
Proof. Let An be the additive group of those sequences (21,%2,...,2n) E€ 


(Z/nZ)" associated to integer polynomials, as in the problem. We will show 
that the map 





where 


z3 
| 

— 

~ 


is an isomorphism of abelian groups ® : pro (z / ate) ~ Án. First, 
note that ® is well-defined: indeed, since a product of d consecutive integers 
is a multiple of d!, it is clear that the sequence (f (2) )1<ien does not depend 
on the choice of representatives a; for Gj. 

Let us prove that ® is surjective. Repeated division algorithm shows that 
any polynomial with integer coefficients of degree at most d can be written in 


504 Chapter 10. Arithmetic Properties of Polynomials 


the form i 
f(X) =a9+ Sra: J] (X - 3) 
=1 j=l 


for some integers a;. We may restrict to d < n since all a, with k > n do not 
matter when considering f(z) (mod n). This yields the surjectivity. 

It is clear that ® is a group homomorphism. It remains to prove that ® is 
injective, so suppose f satisfies f(i) = 0 (mod n) for 1 <i < n. We want to 
show that a, is a multiple of zed nk) for 0 < k < n. Assuming the contrary, 
there is some least k for which this does not hold. Then we may assume a; = 0 
for j < k (since replacing them by 0 does not change the values of f mod n). 
But then plugging in X = k + 1 gives f(k +1) = k!a, = 0 (mod n). and a, 
is a multiple of z dn, zn» contrary to our assumption. 

Thus the number of polynomial sequences (z1,..., £n) is 





n—1l 
n 
N= [I at 


If a prime p has v,(n) = 1, then there are p factors in this product (those with 
k =0,1,...,p— 1) which are multiples of p. If a prime p has vp(n) = 2, then 
there are p factors which are multiples of p? (k = 0,...,p — 1) and p more 
factors which are multiples of just p (k = p,...,2p— 1). Therefore for n which 
are not divisible by the cube of a prime, we have 


N = I] p” - I] pè. O 
p:vp(n)=1 pivp(n)=2 
Proof. For a polynomial f € Z[X] and n > 1, let 
Sa(f) =(F(1) (mod n), f(2) (mod n),...,f(n) (mod n)) € (Z/nZ)" 


and let An = {Sn(f)|f € Z[X]}. We want to find |A,|, at least when n is 
cube-free. First, we will prove the following 


Lemma 10.8. |A,| is multiplicative, i.e. 
IAmn| = |Am| - |Aal 
for all gcd(m,n) = 1. 


10.3. Hilbert polynomials and Mahler expansions 905 


Proof. We will exhibit a bijection between Amn and Am x An. Consider the 
map 


p : Amn > Am X An, 9(Smn(f)) = (Sm(f), Sn(f)). 


Note that it is well-defined, for if Smn(f) = Smn(g), we have f(x) = g(x) 
(mod mn) for all 1 < x < mn and so Sm(f) = Sm(g) and Sn(f) = Sn(g). We 
claim that y is bijective. Injectivity is very easy, for if p(Smn(f)) = 9(Smn(g)), 
then m divides (f — g)(i) for all 1 < i < m and n divides (f — g)(i) for all 
l] <i <n. By the division algorithm, it follows that m divides (f — g)(x) for 
all integers x and the same for n. Since gcd(m,n) = 1, we have mn|f (zx) — g(x) 
for all x and we are done. For surjectivity, we need to prove that if f,g € Z[X]| 
are arbitrary, then there exists h € Z[X] such that h(i) = f(i) (mod n) for 
all 1 <i <n and h(i) = g(i) (mod m) for all 1 < i < m. Simply choose 
h = Amf + Bng, where A, B are integers such that Am = 1 (mod n) and 
Bn = 1 (mod m). The lemma is proved. E 


Thanks to this lemma and the hypothesis that n is cube free, it remains 
to find |Ap| and |A,2|. The first task is trivial: for any p-tuple (a1,...,@p) of 
remainders mod p there exists a polynomial f such that f(7) = a; (mod p) for 
all 1 <i < p, by Lagrange’s interpolation formula. Thus |A,| = p?. Finding 
|A,,2| is more delicate, since Z/ p?Z is not a field and so Lagrange’s interpolation 
formula is useless. Moreover, there are nontrivial relationships between the 
numbers f(i) (mod p°), which implies that |A,2| is definitely smaller than 
pr". 

The first point is to note that 


Ay: = {S,2(f)|f € Z| X], deg(f) < 2p}, 


as for any f € Z[X] one can find g € Z[X] of degree smaller than 2p and such 
that f(x) = g(x) (mod p?) for all x. Indeed, simply take for g the remainder 
of f when divided by (X? — X)*. Let G = {f € Z/p*Z[X]| deg(f) < 2p}, an 
abelian group of order (p°)? = p*? and let S : G > (Z/p?Z)” " be defined 
by S(f) = (f(1), f(2),...,f(p)). The previous observation shows that A,2 is 
the image of the map S : G > (Z/p2Z)? As |Im(S)| = ess): it remains 
to find |Ker(S)|. We can actually describe Ker(S) rather explicitly, thanks 


506 Chapter 10. Arithmetic Properties of Polynomials 


to problem 15. Indeed, this problem (for k = 2) implies that Ker(S) is in 
bijection with the set of polynomials u € F,[X] of degree smaller than p. As 
this set has pP? elements, it follows that |A,2| = p*?. The problem is finally 
solved. O 


10.4 p-adic estimates 


In this section we consider a few problems dealing with p-adic properties 
of polynomials. 


21. Let (a@n)n>1 be an increasing sequence of positive integers such that for 
some polynomial f € Z[X] we have a, < f(n) for all n. Suppose also 
that m — nlam — an for all distinct positive integers m,n. Prove that 
there exists a polynomial g € Q[X] such that a, = g(n) for all n. 


USAMO 1995 


Proof. Let d be the degree of f and choose a polynomial P of degree at most 
d with rational coefficients and such that P(z) = a; for 1 <i<d+1. This 
is possible by Lagrange’s interpolation formula. Choose (and fix) N > 1 such 
that h = NP has integral coefficients. Then h(i) = Na; and h has degree d. 
Fix any integer n > d+ 1 and observe that m — n divides Nam — Nan and 
m-n divides h(m)— h(n). Thus, ifm < d+1, then m—n divides Nan — h(n). 
Consequently, Nan — h(n) is a multiple of lem(n — 1,...,n — (d+ 1)). Note 
that |Nan — h(n)| < Cnt for some constant C, because ay is bounded by f 
and because h has degree at most d. On the other hand, we have the following 
result: 


Lemma 10.9. For any positive integers £1, £2,..., Zn, leM(T1, T2,..., n) iS 
T1T2' Tn 


Proof. It is enough to prove that for any prime p, the p-adic valuation of 
. , T1T2' In iti . — . 
lem(X1,22,...,2%n) is at least that of re jen CEAI Writing yi = vp(zi), 


this comes down to the inequality 


max(y;) 2 `S Yi — ` min(yi, Yj), 


i<j 


10.4. p-adic estimates 507 


which is clear (simply order the y;’s). O 


Coming back to the problem, we infer that 


—d-1 d+1 
lem(n — 1,n — 2,... n — (d + 1)) > — (mada 
Ilicicj<a+i gcd(n — in — j) 


which is greater than C,n¢+! for some constant Cı > 0 and all large n (this 
is because gcd(n — in — j) divides 7 — i). Thus, if n is large enough, then 
necessarily 


lem(n — 1,n — 2,... n — (d + 1)) > |Nan — h(n)|. 


Combining this with the result of the previous paragraph, we infer that there 
is no such that for all n > no we have Nan = h(n). 

Finally, pick any m > 1 and observe that for all n > ng we have m — 
n|Nam — Nan and m — n|h(m) — h(n). Since Nan = h(n), we deduce that 
m — n divides Nam — h(m) for all'in > no, forcing Nam = h(m). Thus, we 
proved the existence of a polynomial g = +h, with rational coefficients, such 
that an = g(n) for all n. E 


Remark 10.10. There are a lot of non-polynomial maps f such that m — n 
divides f(m) — f(n) for all m Æ n. For instance, pick any sequence of integers 
an, infinitely many of which are nonzero and define 


f(n) =} a (nti (ntj-1) n= i) 


j20 
It is an easy exercise to check that f satisfies the desired congruences and that 
f is not polynomial. 
Remark 10.11. It is not true that if a, = g(n) for all n, then g € Z[X]. For 


instance, one can easily check that if g(n) = nn then glm)—g(n) is an integer 
for all different integers m and n. 


22. Consider all sequences (f(1) (mod n), f(3) (mod n),..., f(1023) 
(mod n)), where n = 1024 and f is an arbitrary polynomial with integer 


508 Chapter 10. Arithmetic Properties of Polynomials 


935 


coefficients. Prove that at most of these sequences are permutations 


of 1,3,5, ..., 1023 (mod n). 
USA TST 2007 


Proof. Define (X) = 1 and P;(X) = [jn (X —(2k- 1)) fori > 1. Repeated 
euclidean division (taking into account that P; is monic) shows that each 
polynomial f € Z[X] can be written f = coPp +c: Pı +--+- + cnPn for integers 
c;, where n = deg f. 

Note that P;(2n — 1) is a multiple of 2° - i!, so by Legendre’s formula we 
have v2(P;(x)) > a; for all odd numbers zx, where a; = $` po | se | Note that 
ao = 0,a; = 1,a2g = 3,a3 = 4,a4 = 7,a5 = 8, and a; > 10 fori > 6. 

Say a polynomial f is good if its associated sequence (as in the statement 
of the problem) is a permutation of 1,3,5,..., 1023. Let 


f(X)= XŠ cGPi(X) 


O0<i<n 


be a good polynomial. If we delete the terms with F; for i > 6 (where a; > 10), 
we get a polynomial with the same associated sequence as f, so we are only 
interested in co, C1,...,¢5. Note that co is odd and that f(1) # f(3) (mod 4), 
as f(1) = f(4k +1) (mod 4) and f(3) = f(4k + 3) (mod 4) for all k. But 
since 

f(3) — fQ) = a (Pi(3) — P(1))=2c (mod 4), 


it follows that cı is odd. Finally, note that if we mod out c; by 2!0-%, we get a 
polynomial with the same associated sequence as f, so we are only interested 
in the values of c; (mod 2!9~%). But since co is odd, there are at most 2° 
choices for it; for the same reason there are at most 2° choices for c} (mod 2°) 
and for 2 <i < 5 there are at most 2!°-% choices for c; (mod 2!9-%), Hence 
the number of complete remainder sequences is at most 


5 
29.28. |] gl0-a: = 29. 28. 27. 28. 23. 2? = 2, g 
1=2 


It is a classical result of Mahler that continuous functions on the ring of 
p-adic integers Z, can be uniformly approximated by polynomials (this is the 


10.4. p-adic estimates 509 


p-adic analogue of a classical theorem of Weierstrass, stating that continuous 
functions on a compact interval can be uniformly approximated by polyno- 
mials). It is not difficult to see that the characteristic function of 2Z2 + 1 is 
continuous, so it can be uniformly approximated with polynomials. The next 
problem gives a more precise estimate. 


23. Prove that for all n there exists a polynomial f with integer coefficients 
and degree not exceeding n such that 2” divides f(z) for all even integers 
x and 2” divides f(z) — 1 for all odd integers z. 


P. Hajnal, KoMaL 


Proof. Define O(x) = 1 if z is odd and 0 otherwise. Then we compute that 


O(m) = 5 (= (-1)") = 30-240") = (TE) (2) 


k=1 


l 
2 


Since the binomial coefficients are all integers it follows that truncating after 
the nth term gives a polynomial 


g(X) = 3 (o) (—2)** € Q[X] 


k=1 


of degree n such that for all integers m, g(m) is an integer with g(m) = O(m) 
(mod 2"). Recall that va(k!) = k — so(k) < k — 1. Therefore every coefficient 
of g has an odd denominator. Hence we can choose a polynomial f(X) € 
Z|X| of degree at most n such that every coefficient of f is congruent to the 
corresponding coefficient of g mod 2”. Then f(m) = O(m) (mod 2”) for all 
m and we are done. LJ 


Proof. Define O(z) = 1 if x is odd and 0 otherwise. We want to find a 
polynomial f of degree at most n and such that f(z) = O(x) (mod 2”) for all 
integers x. We will prove the existence by induction, but in order to prove the 
inductive step, we will need the following key result: 


510 Chapter 10. Arithmetic Properties of Polynomials 


Lemma 10.12. Let f be a polynomial of degree at most n and let 


Then for all integers x we have 
n i 1 
F(r+n+1)= dy C )F@ +) (mod 2"). 


Proof. Since f has degree at most n, proposition 10.4 shows that 


St l 4 ‘)£@ +k) =0. 


k=0 


Thus, to prove the lemma is remains to prove that 2” divides 


n+l 
ea a ‘ow +k). 


k=0 

However, if x is even this is equal to —2”, because 
y Cent) ogn 

7 2k +1 
Similarly, if x is odd, then it is equal to 2”, because 

n+l 
= 2”, 

Da) 

k 
This ends the proof of the lemma. E 


The crucial point that we deduce from this lemma is the following: if 
f(x) = O(x) (mod 2”) for all 0 < z < n, then f(x) = O(x) (mod 2”) for all 
integers x. This follows trivially by induction, using the congruence in the 
lemma. 

We can now finally prove by induction the existence of f. For n = 1, 
everything is clear, so assume that we constructed a polynomial f of degree 


10.4. p-adic estimates oll 


at most n — 1 such that f(x) = O(x) (mod 2"~!) for all integers z. We will 
choose our polynomial of the form 


g(X)=f(X)+ oan. T] K-D. 


k=0 jJ#KO<7<n 


It clearly has degree at most n and for all 0 < k < n we have 
g(k) = f(k) +(—1)" "ag - k!(n — ky}. 


By hypothesis, there are integers bx such that f(k) = O(k) + by -2"7? for all 
0< k< n. Now, since 


va(k!(n — k)!) < valn!) <n- 1, 
it is clear that we can choose integers ax, such that 2” divides 
27-1b, + (-1)" *ay - k(n — k)!. 


By construction, 2” will divide g(k) — O(k) for all 0 < k < n and thus for all 
x, by the previous results. The inductive step is thus proved and the problem 
is solved. LJ 


We end this section with a challenging problem having a very elementary 
but delicate solution. The main ideas are the pigeonhole principle and p-adic 
estimates, but the way in which they are combined is rather subtle. A special 
case of this problem (whose proof is very similar) is discussed in chapter 3, 
problem 13. 


24. Let f € Z[X] be a polynomial and let a be an integer. Consider the 
sequence ao = @,@n41 = f (an). If an > œ and the set of prime divisors 
of (an)n is finite, prove that f(X) = AX®@ for some A, d. 


Tuymaada Olympiad, 2003 


Proof. Let f? be the composition of f with itself d times. Assume that 
p1, P2,--.,PN comprise all prime factors of all numbers ag, a,.... First, we will 


512 Chapter 10. Arithmetic Properties of Polynomials 


prove that one of the numbers f(0), f7(0),...,f%(0) is 0. To do this, fix any 
positive integer r and no such that for all n > no we have an > (pipo---pn)’. 
Such no exists by assumption. Thus, for all n > nọ there exists i such that 
Up; (an) > r. Considering N +1 consecutive values n = no, no +1,...,no +N, 
the 2’s that we find take only N possible values, so two of them will be equal. 
Thus, we can find u € {1,2,..., N}, i > no and k €E {1,2,...,N} such that 
Up, (ai) > r and vp, (aitu) > r. Since aitu = f”(ai) = f“(0) (mod a;), it 
follows that p}, divides f“(0). Thus, we proved that for any r we can find 
u € {1,2,...,N} and k such that pj divides f%(0). It follows that one of 
f(0), f2(0),..., f%(0) has infinitely many divisors and the claim is proved. 

If f*(0) = 0, then working with f* instead of f (the iterates of a under f’ 
form a subsequence of the sequence (an)n, and so will still have only finitely 
many prime divisors), we may assume that f(0) = 0 (it is easy to see that if 
f?(X) is of the form AX®@ for some i > 1, then f itself is of this form). Write 
then f(X) = X4(bo +b) X +---+b-X°) for some integers bg,..., be and some 
d > 1 such that bọbe # 0. Assume that the conclusion of the problem does 
not hold, so that e > 1. Note that a, divides a,4,. Since for each 7 there 
is n; such that an; is a multiple of p;, the number ay, +...4n, is a multiple of 
pı :-- pyn and so there is no such that for all n > no we have p,---pnlan. It 
follows easily from this that the prime factors of bọ are all amongst the p;i. 

Consider now some i such that p;|bo. Then 


Up; (@n41) = dup; (an) + Up; (bo + bian ++ bear, ) 
> Vp; (an) + min(1, vp; (an)), 


from which is trivially follows that vp; (an) — oo for n > oo (we know that for 
large enough n we have vp; (an) > 0 by the previous paragraph). Now, we are 
finally ready to conclude: let pj,,....pi, be those primes among p1,...,PN 
that divide bọ. By the previous paragraph, eventually by increasing no, we 
may assume that that vp, (an) > Vp; (bo) for all 1 < j < k and all n > no. 
If q is a prime factor of bo +: -- + beaf with n > no, then q divides an+ı, 
so q is one of the p;’s. But all p,;’s divide an, so q must divide bọ and so 
q € {Pius ---, Pip}. Moreover, since Upi, (an) > Upi, (bo) for all 1 < j < k, we 
have Upi, (bo +: -+beaf, ) = Upi, (bo). Putting together these observations shows 


10.5. Miscellaneous problems 513 


that bo + bian +--+: + bea’ < bo for all sufficiently large n, contradicting the 
fact that an — co. The result finally follows. LJ 


10.5 Miscellaneous problems 


The following problem is a version of a very classical topic, known as the 
Prouhet-Tarry-Escott problem. It concerns disjoint sets of n integers having 
the same sum of dth powers for all 1 < d < k (k depending on n). 


25. Let d,r be positive integers with d > 2. Prove that for any nonconstant 
polynomial f with real coefficients and of degree smaller than r, the 
numbers f(0), f(1),..., f(d" — 1) can be divided into d disjoint groups 
such that the sum of the elements of each group is the same. 


J. O. Shallit, AMM E 3032 
Proof. Define the sets 
A;(r) = {x € [0,d" — 1]|sg(z) =i (mod d)}, 


where salz) is the sum of digits of x when written in base d. We will prove 
that > ze A,(r) f(x) is independent of 2, for any polynomial f of degree smaller 
than r. The proof will be by induction on r, the case r = 1 being clear (since 
then each A;(r) has exactly one element and f must be constant). Assuming 
that the result holds for r, observe that (by linearity) it is enough to prove 
that > re Aj(r 41) 2" is independent of i in order to prove the inductive step. 
But, by definition (and considering the last digit in base d) we have 


Aj(r + 1) = Up {dx + ja € Ai_j(r)}, 


the union being disjoint (and the sets A; being numbered mod d). Thus 


d-1 
So a=) SO (de +5)" 
re A;(r+1) j=0 xEA;_;(r) 


514 Chapter 10. Arithmetic Properties of Polynomials 


Separate in the last sum the term corresponding to x” and observe that it does 
not depend on 2, since it is just 


d `S x’. 
r€ Ao(r)U---UAg-1(r) 


The other terms are independent of i by the induction hypothesis. Thus 
one A;(r+1) X is also independent of 2, finishing the proof of the inductive 
step. LJ 


Proof. This will use the same sets A;, but the proof that they satisfy the 
desired conditions is different. Let z be any dth root of unity different from 1 
and let 

Aaf(X) = f(X)+2f(X +a) 4+ 27 f(X +2a)+---4+24' f(X + (d—-1)a). 


Since 1+2z+---+2z%! = 0, we have deg(A,f) < deg(f). 


Let our disjoint sets be Ag, A;,...Ag_1 as in the previous solution and 
consider 
d-1 
A(z)=So2* | S AX) 
k=0 X EAR 


By an immediate induction, we obtain that 
A(z) = A gqr-1 Agr-2 e AgAif(X). 


But deg(f) < r, so that deg(Agr-1 Agr-2--- AgA; f(X)) < 0 and so A(z) = 0. 
Since this holds for all such z, it follows that the polynomial A(t) is a multiple 
of 1+t+---+#%! and hence > rea, f(x) is independent of k. O 


The following problem is rather technical, but the idea is very simple: 
after some algebraic manipulations, everything comes down to the fact that a 
nonzero polynomial of degree at most d cannot vanish at more than d distinct 
points. 


10.5. Miscellaneous problems 515 


26. Let p > 5 be a prime and let a,b,c be integers such that p does not 
divide any of the numbers a — b,b — c,c — a. Let i,j,k be nonnegative 
integers such that i + 7 + k is divisible by p — 1 and such that for all 
integers xz, the number 


(x — a)(x — b)(x — c)|(x — a) (x — b} (z — c)* — 1] 


is divisible by p. Prove that each of i, j,k is divisible by p — 1. 
Kiran Kedlaya and Peter Shor, USA TST 2009 


Proof. We start with some formal reductions. First, note that we may assume 
that 0 < i,j,k < p—1, as we can replace 7,7,k with their remainders mod 
p —1, without affecting the hypothesis or the conclusion (this uses Fermat’s 
little theorem). We want to prove that i = 7 = k = 0, so assume the contrary. 
By hypothesis, i + j + k = p — 1 or 2(p — 1). In the second case, replace each 
xz € {i,j,k} with p — 1 — z. As this does not change the hypothesis or the 
conclusion, we can assume from now on that i+j+k=p-—1. Finally, we can 
clearly assume that ti is the largest among i, 7, k. 
Multiplying the congruence 


(a — a)(x — b)(a — c)[(a — a) (x — b) (x — c) - 1] =0 (mod p) 
by (x — a)J+* and using Fermat’s little theorem, we deduce that 
f(x) = (z — a)(z — b)(a — c)[(a — b) (x — c)* — (a — a)/**] =0 (mod p). 
for all integers x. On the other hand, f has degree at most 


Xp —1 
34 j+k—-1<24 P- ) 





< P 


(for p > 5) and p different roots mod p. Thus f vanishes in F,[X] and we 
deduce the equality (X — b (X —c)* = (X — a))** in F [X]. Note that 
j+kÆ#0,asi<p-—landi+j+k=p-—1. Thus (X — b) (X —c)* vanishes 
at b or c. But this is impossible, as by hypothesis (X — a)/t+* does not vanish 
at either b or c. O 


516 Chapter 10. Arithmetic Properties of Polynomials 


The next problem is very exotic and tricky. 


27. Prove the existence of a number c > 0 with the following property: for 
any prime p, there are at most cp*/* positive integers n such that p 
divides n! + 1. 


Chinese TST 2009 


Proof. Of course, if p|n! +1, then n < p— 1. Let p > 2 and let 1 < nı < ng < 


-< Nm < p be all solutions of the equation n! = —1 (mod p); Assume that 
m > 1 (otherwise everything is clear). The congruences n;! = —1 (mod p) 
and ni+ı! = —1 (mod p) imply that 


(ni + 1)(ni +2) (ni + ni1 — ni) =1 (mod p). 
Letting k = nj41 — ni, we see that x = n; is a solution to 
(x+1)\(x+2)---(x+k)=1 (mod p). 


Since the polynomial (x + 1)(x + 2)--- (x +k) — 1 € (Z/pZ)[z] has at most k 
distinct roots modulo p, it follows that for each 1 < k < p there are at most 
k indices 7 such that ni+ı — n; = k. We will prove that this is enough to force 
m < cp?/3, 

Choose a positive integer j such that 


(J +1) +2) 


>m > 
2 


i +1) 
> 


Since m > Hath) = 5j, when the differences n;,; — n; are written in 
ascending order, the first is at least 1, the next two are at least 2, and so on, 
each time the next i differences are at least i (this is because for a fixed k, 


1 < k <p, nj41 — ni =k has at most k solutions 2). Thus 


m—1 Di 
> ( . Wg +1)(27 +1 
1=1 


We deduce that a . 
JC + 1)(27 +1) 


10.5. Miscellaneous problems 517 


In particular, p > i and so j < (3p)!”. Since m < (j + 1)°, the result 


follows. LJ 


Before passing to the next problem, let us recall that Bézout’s lemma 
does not hold in Z[X]. However, if f and g are polynomials with integer 
coefficients and with no common complex root, then they are relatively prime 
and by Bézout’s lemma in Q[X] there is a nonzero integer c and polynomials 
A,B € Z[X] such that Af + Bg = c. Usually |c| > 1 and it is not at all 
clear what is the smallest possible value of |c|, given f and g. The following 
problem discusses the case f = (X +1)" and g = X” +1. It is a challenging 
problem, combining several ideas in a very tricky way. 


28. Let n be an even positive integer. Find the least positive integer k for 
which one can find polynomials with integer coefficients f,g such that 


f(X)(X +1)" + Gg X)(X°4+ 1) =k. 
IMO Shortlist 1996 


Proof. Let us write n = 2” . m for some odd integer m and assume that we 
have 


F(X) X +1)” +g(X)( X" +1) =k 


for some f,g € Z[X] and some positive integer k. Taking for X a root z; of 
the polynomial X?” +1, we deduce that f(z;)(zj + 1)" = k. Multiplying all 
these relations and taking into account to ma + zi) = 2, it follows that 
[Ie f(z) -2" = k”. Since I2, f(zi) is an integer (by theorem 9.10), 2” 
divides k?” and so k must be a multiple of 2”. In particular, k > 2™. 

We will prove now that k = 2™ works. Let us see what happens when 
m = 1 first. We need to find polynomials f,g with integer coefficients such 
that 

f(X)(X +1)? +g9(X X? +1) =2. 


The idea is to find f such that 


f(z\(z +1)? =2 


518 Chapter 10. Arithmetic Properties of Polynomials 


for some root z of X?” +1. Indeed, since X?” +1 is irreducible over the rational 
numbers (because (X +1)” +1 is Eisenstein for the prime 2), this would imply 
that X7” +1 divides f(X)(X +1)?” —2, which would give us g. The key point 
is to take z = em, because all the other roots z; of X? + 1 are of the form 
zJ, with odd j. Thus, if z1 = z,..., zər are the roots of X?” +1, then we can 
write zi + 1 = (z + 1)Q,;(z) for some polynomials Q; with integer coefficients. 
And since ma + zi) = 2, it follows that (1 + z)?” I, Qi(z) = 2 which 
gives us the polynomial f and finishes the proof in the case m = 1. 

Finally, it is rather formal to deduce the general case from the case m = 1. 
Namely, pick polynomials with integer coefficients f,g such that 


f(X)(X 41)? + g(X)(X?" +1) =2. 
Then 
f(X)"(X 41)" = (2 —g(X)(X? +.1))™ = 2™ +(X?” +1)h(X) 


for some h € Z[X]. The last equality follows from the binomial formula. Now, 
replace X by X™ in the previous equality, to get 


f(XMYP (XM +1)” = 2™ +(X” + 1)A(X™) 
and observe that (X™ + 1)" = (X +1)"A(X) for some A € Z[X]| (because m 


is odd). The conclusion is now clear. LJ 


The idea of the following problem is very natural, but there are a lot of 
technical details one has to deal with, which makes the proof rather long. 


29. Suppose that f is a polynomial of degree at least 2, with positive leading 
coefficient and integer coefficients. Show that there are infinitely many 
n such that f(n!) is composite. 


IMO Shortlist 2005 


Proof. We will try first to find prime numbers p and positive integers n such 
that p|f(n!). Then, we will ensure that n is large enough and finally we will 
have to get rid of the cases f(n!) = 0,p, —p. Write 


f(X) = aX + ag- X”! +- +aọ, 


10.5. Miscellaneous problems 519 


with ag > 0 and d > 2. Note that we may assume that ag Æ 0, otherwise the 
problem is trivial. 

First, let us consider the equation f(n!) = 0 (mod p). Unless p divides 
ag, this forces n < p. So, let us look for n = p—k with k > 0. We have to 
compute first (p — k)! (mod p), which is very easy by Wilson’s theorem: 


p-—1)! 
(p—k)'(p—k+1)---(p—1) 
(p — k)'(—1)*-1(k — 1)! (mod p). 


—1 


II 
_— 
panel 


Thus, we have f(n!) =0 (mod p) if and only if p|z,, where 
Tk = ao(k — 1)!¢ + a(k — 1)!@-1(-1)* + +--+ ag(—1)**. 


We will prove first the existence of large prime factors of £k, more precisely 
such that p > k. This is the content of the following 


Lemma 10.13. There exists ko such that for all k > ko, there exists a prime 
factor py of x, such that pk > k. 


Proof. This is easy: choose kı such that vp((kı — 1)!) > vp(aa) for all primes 
p < |aqg|. If all prime factors p of xz are less than k for some k > kı, they divide 
(k— 1)! and zx, so they divide ag. But for such a prime p, since vp((k — 1)!) > 
Up(@a), we must have up(z_) = vp(aqg). We deduce that |z| < |ag|. Now, 
choose kg > kı such that |z| > |ag| for all k > ko, which is possible as 
Qo Æ 0. LJ 


Fix now ko and py as in the lemma. Fix also a positive integer N and 
assume that none of the numbers f(n!) with n > N is composite. By increasing 
N, we may assume that x > f(z!)—z is increasing on [N, œ). By construction, 
px divides f((p,—k)!). Thus, if p,—k > N, then we must have f((p,—-k)!) = pk 
and this will happen if we ensure that k,k +1,...,k + N — 1 are composite. 
To have this, we can choose k = kg = a(N + 1)!+2 for a > 1. Denoting 
La = Pk, — ka, we deduce that f(£a!) = £a + a(N +1)! +2 for all sufficiently 
large a (so that kg > ko). In particular, the last relation shows that rg — oo, 


520 Chapter 10. Arithmetic Properties of Polynomials 


because the map a +> “gq is injective. In particular, for infinitely many a we 
have Lq41 > Za +1 and so 


f(a!) — £a +(N +1)! = f(£a+1!) — Pati 2 f((£a + 1)!) — (a + 1). 
This implies that 
f((ta + 1)!) — fla!) <1 4+ (N +1)! 
which is certainly impossible because Hee — oo for a — oc. Thus our 


assumption was wrong and at least one of the numbers f(n!) with n > N is 
composite. Since N was arbitrary, the conclusion follows. LJ 


Remark 10.14. The hypothesis that d > 2 was not used in the proof, so the 
result still holds for linear polynomials with positive leading coefficient. For 
instance, this also solves the following Chinese TST 2011 problem: prove that 
for all positive integers d there are infinitely many n such that d- n! — 1 is 
composite. 


10.6 Notes 


The following people helped us with solutions and we would like to thank 
them: Alexandru Chirvasitu (problem 24), Xiangyi Huang (problems 6, 12, 
27), Holden Lee (problem 22), Mitchell Lee (problem 2), Fedja Nazarov (prob- 
lem 16), Hunter Spink (problem 10), Richard Stong (problems 7, 8, 20, 23, 
25), Qiaochu Yuan (problem 1), Victor Wang (problem 14), Gjergji Zaimi 
(problems 3, 5), Alex Zhu (problems 14). 


Chapter 11 


Lagrange Interpolation 
Formula 


It is a standard fact that a nonzero polynomial f with coefficients in a 
field K has at most deg f roots in K. Lagrange’s interpolation formula is in 
some sense a refinement of this standard result: it makes precise the way in 
which deg f + 1 points of K determine the polynomial f. 


Theorem 11.1. Let K be a field and let £0, 21,...,2n be distinct elements of 
K. Then for any polynomial f € K[X] of degree at most n we have 


2 X — 2; 
F(X) = DF (te) ° 7 
2 k Ucz 





Proof. Simply note that the difference between the polynomials in both sides 
of the desired equality is a polynomial of degree at most n which vanishes at 
Z0, %1,- --, Zn, thus it is the zero polynomial. O 


The following corollary is an immediate consequence of the previous the- 
orem, but it is rather useful in practice, especially when proving complicated 
identities. 


522 Chapter 11. Lagrange Interpolation Formula 


Corollary 11.2. Let f € K[X] be a polynomial of degree at most n and let 
T0, Z1;-..,Zn E K be distinct elements. Then 


L f (xk) 
k=0 [Lan (&e — Tj) 


is 0 if deg f < n and equals the leading coefficient of f otherwise. 
Proof. Simply identify the coefficients of X” in Lagrange’s identity. O 


The first two problems are direct applications of Lagrange’s interpolation 
formula. 


1. A polynomial p of degree n satisfies p(k) = 2* for all 0 < k < n. Find 
its value at n + 1. 


Murray Klamkin 


Proof. Using theorem 11.1, we obtain 


n+l-—j 
1) — ~ ~“ 
wet) = 50: E 
j#k 
Ses 1)r-kok 
= rt! _], 


LJ 


Remark 11.3. There is also a neat solution without use of the interpolation 
formula: consider the polynomial 


ro GE 


It has degree n and satisfies f(k) = 2 for all 0 < k < n by the binomial 
formula. Thus we must have f = p and then clearly p(n +1) = 2"+! — 1. Even 
though very neat, this solution is not so conceptual. 


023 


2. A polynomial f of degree n satisfies 





for all O0 < k <n. Find f(n+1). 
Titu Andreescu, IMO Shortlist 1981 


Proof. This is also immediate using theorem 11.1: 


f(in+1) -Y f(k =H 


k=0 jÆk J 


= aay ork") 


3. Prove that for any real number a we have the following identity 
(1) (o) (a—k)" =n. 
k 
k=0 
Tepper’s identity 
Proof. Use corollary 11.2 for the polynomial P(X) = (a — X)” and the points 


Tk = k. LJ 
4. Let n n+p 
o Tk ç 
Sp = H. 
k=0 (Tk — Tj) 
jk 


Prove that S1 = zo + z1 +--+ Tn and compute So. 


524 Chapter 11. Lagrange Interpolation Formula 


Proof. Actually, the following method shows how to compute Sp in general. 
The idea is to consider the remainder f(X) of X"t? when divided by 


[X -2;) 
j=0 


and to apply Lagrange’s interpolation formula to it. Identifying leading coef- 
ficients and using the fact that f(z;) = pet? , we deduce that Sp is precisely 


the leading coefficient of f(X). Now, for p = 1 we clearly have 


n. 


f(X) =X" — [[(X - z;), 


j=0 


so that its leading coefficient is ` ;—ọ zi. On the other hand, for p = 2 we 


must have 


X"+? = Q(X) |] (X - z;) + f(X) 


j=0 
for some polynomial Q. Comparing degrees and leading coefficients shows that 
Q(X) = X +c for some constant c. To determine c, we impose the condition 
that deg(f) < n. This implies that c = }0%_)2;. Then, it is easy to find 
2 
the coefficient of X” in f(X) and the answer is (Xo zj) — J o<i<j<n Titj. 


Actually, we leave as a nice exercise for the reader to prove that for all p we 


have 
— al „42 a 
Sp = X Ty Ugh Ty", 
a,+a2+°::+an=p 
where in the sum above the a; are nonnegative integers. LJ 


The following problems deal with extremal properties of polynomials. The 
underlying philosophy is that imposing conditions on the values of a polyno- 
mial at sufficiently many points of an interval automatically imposes conditions 
at all other values. Lagrange’s interpolation formula is a very handy tool in 
such situations, but there is an extra ingredient which appears all the time, 
namely the Chebyshev polynomials. Recall that the nth Chebyshev polyno- 
mial Tn is the unique polynomial of degree n such that cos(nz) = T;,(cos x) 


020 


for all x € R. It is not really obvious that T, exists, but the reader can 
easily check the existence inductively, by establishing the recurrence relation 
Tn41(X) + Tn-1(X) = 2XT,(X). This also implies that the leading coeffi- 
cient of T, is 2"~!. A fundamental theorem of Chebyshev states that for any 
monic polynomial f € R[X] of degree n we have maxzej-1,1] |f (£)| = Saat 
with equality if and only if f = ty Ta. This explains why Chebyshev poly- 
nomials are so important. For a proof of Chebyshev’s theorem, see problem 
14 in this chapter. 





5. Let a,b,c be real numbers and let f(x) = ax? + bx + c be such that 


max{|f(+1)|,|f(0)|} < 1. 














5 1 
Prove that if |x| < 1 then |f(z)| < 1 and |x’ f (5) < 2. 
T 
Spain 1996 
Proof. Using Lagrange interpolation, we can write 
X? +X X? -X 
F(X) = FO = X*) + EO + f(-)— 


We deduce that for all |z| < 1 we have 


2 2 
L +T Tî — T 
2, læ tal] eal _ 


TORSE - 


9 
1- x° + |z| < =, 
T + \2] <7 


the last inequality being equivalent to (|z| — 5)? > 0. Similarly, we find that 


1l+2 l-r 
z 1 9 








ja? f(1/x)| < 1 — z? + =2 -r?° <2. o 


6. Find the maximal value of the expression a? +b? +c? if |jaz*+br+c| < 1 
for all x € [—1, 1]. 


Laurentiu Panaitopol 


526 Chapter 11. Lagrange Interpolation Formula 


Proof. Define 
A=f(1), B=f(0), C= f(-1). 
Then we easily obtain 


a= 240 _p p=A =" cue. 





Therefore, an immediate computation gives 


> Æ+’ 
— 2 


a? +b +c + 2B° — B(A + C). 


Since |A|, |B|, |C| < 1, the last expression is clearly bounded above by 5 (use 
the obvious estimate for each term of it). Thus, we always have a?+b? +c? < 5. 
To see that it is optimal, simply take Chebyshev’s polynomial 2X? -1. O 


7. Define F(a, b,c) = max |x? — az? — bx — c|. What is the least possible 
rEe|0, 


value of this function over R?? 


Chinese TST 2001 


Proof. Let Pap c(X) = X 3— aX? — bX —c. The idea is to map the interval 
[0,3] to [—1, 1] via an affine map and then to use Chebyshev’s least deviation 
theorem in order to bound from below maxze¢(9,3) |Pa,b,c(z)|. Note that 


3(2 +1 
Pane (557) 
3(X+1) 


Since Pb, (32) is a polynomial of third degree with leading coefficient 


max |P xr) = max 
max | a,bycl)| zé€(—1,1] 





27/8, Chebyshev’s theorem gives us the estimate 


Pa,b,c Goa 


Thus F(a, b,c) > sf and this is optimal, since equality holds for 


20 
> 


max a 32° 


reE(—1,1] 








27 
— — T: — 
Pa.b.c 39 3(2X/3 1), 


where T3(X) = 4X3 — 3X. o 


O27 


Remark 11.4. For third degree polynomials, Chebyshev’s theorem is very easy 
to prove using Lagrange’s interpolation formula: if f is a monic polynomial of 
third degree, identifying leading coefficients in Lagrange’s formula yields the 
equality 
,_ 2f) _ 4f (1/2), 4f(-1/2) 2f) 
3 3 3 3 

Of course, this can also be trivially checked by hand. This equality and the 
triangle inequality imply that maxzej-1,] |f (x)| = 1 for any such f, with 
equality when f(X) = X’ — $X. 





8. Let a,b,c,d € R such that |az? + bz? + cz + d| < 1 for all x € {-1,1]. 
What is the maximal value of |c|? For which polynomials is the maximum 
attained? 


Gabriel Dospinescu 


Proof. Choose distinct numbers 29,21, 22,23 and identify the coefficients of 
X in Lagrange’s formula 


3 


aX? +bX?+cX+d=)_ fen |] 
k=0 j#k 


X — 2; 
Lk — Tj 





We deduce that 


E „y _ T122 + 2034+ T311 
el = È fro) To a) (ao — £2)(a0 — 13) 
<> 


1X2 + T9273 +7371 
(xo — £1)(Zo — £2)(Xo — T3) 
The problem is to find a 4-tuple (zo, £1, 22,23) which minimizes the last ex- 
pression. As usual in this kind of problems, a good idea is to choose the points 
where |T3(x)| takes its maximal value 1 on the interval [—1, 1]. These are the 
points to = —1, z1 = -4 z2 = 5 and z3 = 1. It is easy to compute the last 
sum in this case and we find that |c| < 3. Since this value is attained for 
the polynomial T3(X) = 4X 3 __ 3X, this is the maximal value. Also, it is not 
difficult to check that equality appears in the above chain of inequalities only 


for T3 and —T3. O 











028 Chapter 11. Lagrange Interpolation Formula 


9. If a polynomial f € R[X] of degree n satisfies |f(x)| < 1 for all x € [0, 1], 


then ; 
e(a) 
n 


Proof. We will use Lagrange’s interpolation formula at the points k/n for 


0< k< n. We have 
1 k k —(j +1) 
ORHON ES 


< gnt+l — ]. 





which shows that 


1 
(2 
n 
10. Let a > 3 be a real number and let p be a real polynomial of degree n. 
Prove that 


<5 J] itt iy 


k=0 j#k k=0 





PEI) ot], o 
k+l 








t pil > 1. 
i0 ntl ja p(t) | 7 


Proof. The crucial observation is that we have 


n+l 
n+l 
5nr=( L ) pte) = 0. 
k=0 
This is simply Lagrange’s interpolation formula written in a slightly changed 


form (though a more conceptual way of seeing this is using the finite differences 
theory). We deduce from this and the binomial formula that 


n+1 
a- = Saye") (hy = ab, 
k=0 


Thus, if |p(k) — a*| < 1 for all 0 < k < n + 1, we must have 


n+l we (ntl n+l 
a-i <S h E2", 
k=0 


contradicting the fact that a > 3. This finishes the solution. o 


029 


Remark 11.5. The identity 


Syt ("; *) (a) =0 


k=0 


for polynomials of degree at most n is extremely useful. We proved it here 
as a consequence of Lagrange’s interpolation formula, but it also follows from 
the theory of finite differences, for a glimpse of which we refer the reader to 
section 10.3. 


11. Let a,b,c,d € R such that jaz’ + br? + cr + d| < 1 for all x € [—1, 1]. 
Prove that 
Ja] + [b] + |e] + |d| < 7. 


IMO Shortlist 1996 


Proof. Let us look at the values of P(X) = aX? + bX? + cX +d at 
—1,—1/2,1/2,1, which are the classical interpolation points in Chebyshev’s 
theorem (for n = 3). Writing 


A=f(l), B=f(1/2), C=f(-1/2), D=s(-0), 


we can easily express a,b,c,d in terms of A,B,C,D. An easy computation 
gives 
4 2 


a= =(A-D)-3(B-C) b “(A+ D) -=(B +0), 


and 
1 4 1 2 
c=—((A-D) + 3(B-C), d=—-(At+D)+ 3(B +C). 


This shows that f(a, b, c,d) = |a| + |b| + |c| + |d| is actually a convex function 
of A,B,C, D e [—1,1]. Thus it attains its maximum when all A, B,C, D are 
equal to 1 or —1. Now, it is a tedious matter to check that in all cases the 
expression is at most 7. We have equality for the Chebyshev polynomial (or 
its opposite) of degree 3. LJ 


030 Chapter 11. Lagrange Interpolation Formula 


<1}, 


IMC 1998 





1 
12. Let A= {p ERIX]| deep <3, IP(EDI< 1, |p (+3) 


Find sup max |p” (x)|. 
pe A |z|<1 


Proof. Write p(X) = aX? + bX? + cX +d. Since p(x) is an affine function 
on [—1, 1], the maximum of its absolute value is obtained for x = 1 or z = —1. 
Thus, we need to bound in an optimal way the number 


max(|6a + 26|, | — 6a + 2b|). 


Taking the Chebyshev polynomial shows that this can be already as large as 
24. But it is always at most 24 (under the given hypothesis on p), since for 
instance using the formulas in the solution of problem 11 yields 

16 8 28 20 

6a + 2b = — pl) — sp(—1) — (1/2) + > p(—1/2). 

3 3 3 
Thus |6a + 2b| < 24 by the triangle inequality. We proceed in a similar way 
to bound | — 6a + 2b| or we can observe that if p satisfies the conditions in the 
problem, so does p(—z). E 


13. Let n > 3 and let f,g € R[X] be polynomials such that the points 
(f(1), 9(1)), (f(2), 9(2)),..-, (f(n), g(n)) are the vertices of a regular n- 
gon in counterclockwise order. Prove that max(deg f, deg g) > n— 1. 


Putnam 2008 


Proof. It is enough to prove that P(X) = f(X) +ig(X) has degree at least 
n—1. Clearly, we may assume that the regular n-gon has vertices z, zo... 2” 
forz=en (simply apply a translation and a homothety to reduce the general 
case to this one). So, it remains to prove that if a polynomial P satisfies 
P(i) = zt for all 1 <i < n, then deg P > n—1. Assume that this is not the 
case. Using Lagrange’s interpolation formula, we obtain 


- oe T= (on 


i=2 iE) 1=2 


v 


ool 


Dividing both sides by z and using the binomial formula yields (1 —z)"~+ = 0, 
a contradiction. LJ 


Remark 11.6. There are many other approaches. The most efficient is the 
method of finite differences, which yields 


n—-1 
(n-1 
A"! P(1) = N (1) (" | ) Po —j)=2(z-1)"' £0. 
j=0 J 
Another very neat proof considers the polynomial P(X + 1) — zP(X). It has 
degree at most n — 2, it is clearly nonzero and vanishes at 1,2,...,n—1, a 
contradiction. 


We leave the land of routine problems in the Lagrange interpolation for- 
mula and tackle some more delicate results. The following is taken from [66]. 


14. Let f be a complex polynomial of degree n such that |f(x)| < 1 for all 
z € [—1,1]. Prove that | f)(x)| < iT” (x)| for all k and all real numbers 
x such that |x| > 1. Deduce Chebyshev’s theorem: if f is monic of degree 
n, then 


1 
a > . 
ge 2 ga 





W.W. Rogosinski 


Proof. Pick for the moment n + 1 points Z0,...,2n in [—1,1] and apply La- 
grange’s interpolation formula: 


Next, differentiate this equality k times, so if we set P;(X) = IL% — Tj), 


then 
k) _ f (zi) (k) 
T) — 2 P;(2x;) 1 


032 Chapter 11. Lagrange Interpolation Formula 


for all x. Choose any |z| > 1 and apply the triangle inequality to the previous 
identity, together with the hypothesis that |f(z;)| < 1. We infer that 


k lao 
FO w 1 ea TAEA 


But we also have 





TP (x) = Tn(zi) 
so it would be really nice if we could ensure that 
n (k) 
P. 
S Talr) = 2|- 
i=0 P;(xi) 


In order to have this, we already need [Ta(xi)| = 1 for all 2, which suggests 
taking the old friends z; = cos ( Z) for0 <i<n. Then T,(z;) = (—1) and 


= [PS Œ 
¢ |Pi(2i) 











. p% 
we would like to prove that the numbers (—1)’ - a have the same sign. 
If this were true, we would be done by the previous arguments. Note that 
To > T1 > :-- > Ti, so that 


P;(xi) = (£i — zo)(£i — £1): (Ti — Ti—1)(Ti — Ti+1) +++ (Li — Tn) 


has the sign (—1)*. So, it remains to see if the signs of the numbers P)(z) 


are independent of 7 as long as |x| > 1. The crucial remark is that p*) have 
all their roots in [—-1,1]. Indeed, this follows from Rolle’s theorem and the 
fact that P; have all their roots in [—1,1]. But then it is clear that the sign 
of P(x) only depends on the degree of p\*) (which is independent of i) and 
on the position of x with respect to 1 (ie whether x > 1 or x < —1). So, it is 
independent of 2 and we are done. 

Finally, let us prove Chebyshev’s theorem using this result. Suppose that 
|f(z)| < se for all z € [—1,1]. By compactness there exists c > 2"~' such 
that |cf(x)| < 1 for all x € [—1,1]. Using the result established above, we 


deduce that for |z| > 1 we have lof (x)| < T(x). As f is monic of 


degree n and since Tp has degree n and leading coefficient 2"~!, this yields 
c<2"—! a contradiction. O 





033 


15. The polynomials f,g € R[X] and a € R[X,Y] satisfy 
f(x) — fly) = a(z, y)(9(x) — g(y)) 


for all x,y. Prove that we can find a polynomial A such that 
f(x) = h(g(z)). 
Russia 2004 


Proof. Fix an integer N > deg f. If g is constant, so is f and everything is 
clear. So, assume that g is not constant, thus we can choose N + 1 distinct 
numbers 20, 21,...,£yN at which g takes different values. By Lagrange’s inter- 
polation formula, there exists a polynomial h of degree at most N such that 
h(g(zi)) = f(z;) for all 0 < i < N. Note that by hypothesis 


F(X) — F(z:) = a(X, i)(g(X) — g(2:)), 


so that g(X) — g(x;) divides f(X) — f(zi). On the other hand, g(X) — g(z;) 
also divides h(g(X)) — h(g(xi)) = h(g(X)) — f (xi) (this follows by linearity 
and the fact that g(X) — g(z;) divides g(X)* — g(x;)* for all k > 1). Thus 
h(g(X)) — f(X) is a multiple of g(X) — g(z;). Now, the polynomials g(X) — 
g(x;) are clearly pairwise relatively prime, since the numbers g(z;) are pairwise 
distinct. Thus h(g(X)) — f(X) is a multiple of [Io (9(X) — g(z;)), but has 
degree smaller than that of [Teo (9(X) — g(z;)). Therefore f(X) = h(g(X)) 

o 


and we are done. 


The next problem presents P.J. O’Hara’s beautiful proof [63] of a famous 
inequality of Bernstein. 


16. a) Prove that for all complex polynomials f having degree at most n 
and for all x € C 


i n le 22 
zf (z) = siz) +> 2S (eG 


where z, are the roots of the polynomial X” + 1. 


534 Chapter 11. Lagrange Interpolation Formula 


b) Deduce Bernstein’s inequality: for all complex polynomials f we 
have ||f'|| < n| fll, where || fl] = max |f (2). 


P.J. O'Hara 


Proof. a) The polynomial 


g) = LEO E 


has degree at most n — 1 and it is easy to see that gz(1) = xz f'(x). Lagrange’s 
interpolation formula for the points z; yields 


n 
X — 2; 
se(X) = goles) TJ A 
i=1 jfi I 





which combined with the equality! 


gives 
_ X” +1 
92(X) = n > 92 (2) 
Taking X = 1 and remembering that g,(1) = z f'(x), we obtain 
lc 2zi 
zf'(2) =~) (fea) — f@) Ge 


ns 
1=1 


so we only need to prove that 


y 22; o n 
Gon? Te 
— (1 — zi) 2 
‘Obtained by differentiating the identity X” + 1 = [[X« — z;) and taking X = zi. 


i=1 


O30 


Simply choose f(X) = X” in the equality 


lv 22; 
/ — \— l 
zf'(z)=- 2 (flea) HDT 
b) By scaling f, we may assume that ||f|| = 1. Choose any |z| = 1 and 
use the identity in (a) to deduce that 


n 
22k 


= (1 = zx)? | 








The miracle is that all numbers 
22k _ l _ l 
(1 — zk)? 1 (zx 4 1) —1  Re(zk)-—1 


are actually negative, so the previous inequality becomes 


(2) < 2+ 
f (2 7—2 Drip T 


Since we’ve already seen that 
3 22; _ n? 
i=1 (= zi)? 2 


we can now safely conclude that |f’(z)| < n, finishing the proof of this hard 
problem. LJ 


We end this chapter with a beautiful inequality due to Gelfond, who used 
it in his solution of Hilbert’s seventh problem. This famous problem asked to 
prove that a? is transcendental whenever b is an algebraic irrational number 
and a is an algebraic number different from 0 and 1. 


17. Let f be a complex polynomial of degree at most n and let zo, 21,..., Zd 
be the zeros of the polynomial X¢+! — 1, where d > n. Define 


[fll = max |f(2)], 


036 Chapter 11. Lagrange Interpolation Formula 


a) Prove that if there exist n + 1 pairwise distinct numbers 
Z0,21,.--,X%n among 29, 21,...,2q such that |f(x;)| < od then 
Al] < 1. 


b) Deduce that 
FIL gl] < 48° Fee . |] F gl]. 


A.O. Gelfond 


Proof. a) The first step is rather clear: Lagrange’s interpolation formula for 
the points %9,21,...,2n yields 





Combining this with the triangle inequality and the hypothesis, we obtain for 


jz] = 1 
|z — 25 
is tt 
1=0 jHi Ti J 


We brutally bound each |z — z;| by 2, so that [],7;|z — 2;| < 2”. The subtle 
part is to deal with [],_; |x; — xj|. Write 


{Y1,---,Yd—n} = {20,---, Za} — {Zo,.--, En} 
and observe that 
d-n 
] [lei - z; [] lei - vel =| [] @i-2)) =a 41, 
j#i k=1 Zjfīi 


because etr; (2 — zj) is exactly the derivative of X+! — 1 at z;, which has 
absolute value d + 1. Since |x; — y,| < 2, we infer that 


d+1 
[[ x: - z| > don 
j#t 


11.1. Notes 537 


All in all, each term [[;4; E= can be bounded from above by 
27 2% 
dL d+ 


Since there are n + 1 terms and n+ 1 < d + 1, we deduce that for all |z| = 1 
we have |f(z)| < 1 and so ||f|| < 1. 

b) This part is an easy consequence of the first one. Scaling f and g 
allows us to assume that ||f|| = ||g|| = 1. Assuming that ||fg|| < 4, we 
deduce that for any 0 < i < d we have |f(z;)| < A or |g(z)| < A. With 
d = deg(f) + deg(g), it follows that either we can find at least deg(f) + 1 
zis such that |f(z)| < 4 or we can find at least deg(g) + 1 z;’s for which 
lg(zi)| < H. Both cases are excluded by part a), and the result follows. DO 


11.1 Notes 


We would like to thank the following people for providing solutions to the 
following problems: Xiangyi Huang (problems 6, 7, 8, 9, 12), Qiaochu Yuan 
(problems 1, 3, 10). 


Chapter 12 


Higher Algebra in 
Combinatorics 


This last chapter deals with applications of linear algebra in combina- 
torics. ‘There is a huge literature on this subject, which is still very active, so 
all we can do in this chapter is to barely scratch the surface and present the 
reader with some interesting elementary examples. Of course, this does not 
mean that there aren’t a lot of difficult problems in this chapter. 

Before passing to problems, let us recall some basic results of linear al- 
gebra. Let K be a field. A K-vector space V is an abelian group endowed 
with an external multiplication K x V —> V, denoted (a,v) — a-v, which is 
compatible with the additive operation of V and with the operations in K. A 
subspace of V is an additive subgroup which is stable under multiplication by 
elements of K. A family (vi)ier of elements of V is called linearly independent 
if for all finite subsets S C J and all (a;)ies € KISI, the relation ses as'Us = 0 
forces as = 0 for all s € S. A family (vi)ier is called generating if any vector 
v € V can be written v = $` seg @s* Us for some finite subset S C J and some 
as E€ K. Finally, a basis of V is a linearly independent family which is si- 
multaneously generating. Using Zorn’s lemma, one can prove that any vector 
space has a basis and that any linearly independent family of vectors can be 
extended to a basis. In most of the following problems we will only consider 
finite dimensional spaces, i.e. vector spaces having a finite generating family. 


540 Chapter 12. Higher Algebra in Combinatorics 


The dimension of such a space is by definition the number of elements of a 
basis (it is not obvious, but it can be proved that this does not depend on the 
choice of the basis). 

If V and W are vector spaces over K, a map f : V — W is called linear if 
f (vy + v2) = f(v1) + f (v2) for all v1, ve E€ V and f(a-v) =a- f(v) for all v € V 
and a € K. The kernel of f is then the subspace Ker(f) = {v € V|f(v) = 0} 
of V. A fundamental result in linear algebra is the following formula 


dim V = dim Ker(f) + dimIm(f). 


Traditionally, dim Im(/f) is called the rank of f. 

Let f : V — V bea linear map, where V is a K-vector space of dimension 
n. If e1,€2,...,€n is a basis of V, the matrix of f in this basis is defined by the 
equalities f(e;) = = ajiej. It is easy to check that the matrix associated 
to the composition of f and g is simply the product of the matrices associated 
to f and g. Conversely, any matrix can be naturally seen as the matrix of an 
endomorphism of V in the given basis. 

If A is an n X n matrix with entries in some commutative ring R, its 
determinant is defined by 


det A = `S E(0)a1o(1)420(2) "**Ana(n)) 


oESn 


Sn being the group of permutations of {1,2,...,n} and e(o) being the sig- 
nature of o. The most important (and rather surprising!) property of the 
determinant is its multiplicativity, i.e. det(AB) = det A- det B. Another im- 
portant property is that A is invertible in M/,,(R) if and only if det A is a unit of 
R. Thus, if R is a field, we can test if a matrix is invertible by testing whether 
its determinant is nonzero and this happens if and only if its associated endo- 
morphism is bijective. Let Mn(K) be the space of n x n matrices with entries 
in K. If A € M,(K), its characteristic polynomial is det(X - In — A), a monic 
polynomial of degree n with coefficients in K. Its roots in an algebraic closure 
K of K are called the eigenvalues of A. Hence, if À is an eigenvalue of A, then 
we can find v € K” such that A -v = À- v, i.e. Do jai lijtj = À- v; for all 
i. It is not difficult to check that the sum of the eigenvalues of A (counted 
with multiplicities) is equal to the trace of A, which is by definition the sum 


12.1. The determinant trick 541 


of the diagonal elements of A. Also, det A is the product of the eigenvalues of 
A (again counted with multiplicities). It is a little bit trickier to prove that 
the eigenvalues of g(A) (where g € K[X]) are g(\1), g(A2),---, g(An). 


12.1 The determinant trick 


We present in this section a few tricky combinatorial problems in which 
the determinant of a matrix plays an important role. 


1. Let n > 2. Find the greatest p such that for all k € {1,2,...,p} we have 
n k n k 
` ($0) = 5 (Z0) 
o€An \i=l o€By \i=1 


where Án, Bn are the sets of all even and odd permutations of the set 
{1,2,...,n} respectively. 


Gabriel Dospinescu 
Proof. The first ingredient is the following: 


Lemma 12.1. Let aj,a9,...,@m and bi, bo,..., bm be positive integers and let 
N be a positive integer. Then 


ak +ak+..-+ak = bf + db +--- 40%, 
for alll <k< N if and only if (X —1)%*! divides 
XM XL 4... 4 Kam — X X21 Om, 
Proof. This is quite easy: if 
F(X) = X XP 4---4 XO — XM XM... A, 


then (X — 1)N+! divides f if and only if f)(1) = 0 for all 0 < j < N. This 
happens if and only if 


m 


S ailai — 1) +++ (ai= j +1) = X bili - 1) ++ (06 -F l) 
1=1 


i=1 


542 Chapter 12. Higher Algebra in Combinatorics 


for all 0 < 7 < N. As the polynomials X (X — 1)---(X — j + 1) for 1 < 
j < N span the same vector space as the polynomials X, X?,..., X N (this is 
immediate), the previous equality holds if and only if 

ak +ak+.-.-+a% = Eb H E, 
for all 1 < k < N. The conclusion follows. O 


Using the previous result, we deduce that p + 1 is the multiplicity of 1 as 
a root of the polynomial 


f(X)= X XP -— N o, 
o€An o€B, 
where S(o) = o(1) + 20(2) +---+no(n). The key point is that we can write 


f(X)= `S e(o) Xe") . X202)... gnoln) — det(A), 
aESn 


where aj; = X”, Sn is the set of all permutations of {1,2,...,n} and e(o) is 
the signature of ø (i.e. 1 if o € An, —1 if o € Bn). Finally, using properties 
of Vandermonde determinants, we can write 


I i- x 


1<i<j <n 


n(n+1) 
2 . 


f(X) =X 


and since the multiplicity of 1 in XJ — Xt is 1 for all i 4 j, we deduce that 
p+1= (5). Thus the answer to the problem is (5) — 1. LJ 


The next problem is rather strange, both because of the statement and 
because of one of its proofs. We also present a more natural proof, based on 
the inclusion-exclusion principle. 


2. For a permutation o of {1,2,...,n} let e(o) = 1 if ø is even and —1 
otherwise. Let f(a) be the number of fixed points of ø. Prove that 


elo) n n 
De Flay Oa 





12.1. The determinant trick 543 


where the sum is taken over all permutations o of {1,2,...,n}. 
Putnam 2005 


Proof. We start with the exotic proof: observe that 


1 
T+ fo) 7 an = Lele) fa FO) dy -j (Saoer) dx 


We recognize the determinant of the matrix A(z);; = x'=5 under the integral. 


But det(A(z)) can also be computed using row-column operations: add all 
columns to the first one, take out the common factor z + n — 1 and then 
subtract the first column (which consists now only of ones) from all the others, 
to make the elements in the first row equal to 0, except for the first one. Next, 
expand the determinant using the first row. We obtain in this way that 


det A(x) = (z +n — 1)(z —1)"71. 


Thus 
1 
e(o n- 
DTT h ETME 
0 
-J (x +n)" 'dz 
-1 
_ (—1)"t? _ (..1\n 
E n+l 
— —] n+1_ P 
(=1) n+1'’ 
which is precisely the desired result. LJ 


Proof. Here is a more natural proof: we will partition the permutations o 
according to their fixed points. For a permutation ø, let Fix(a) be the set of 
its fixed points and let f(o) = |Fix(c)|. As 


eso -bpn > ` e(o) | , 


|Al=k \Fix(o)=A 


044 Chapter 12. Higher Algebra in Combinatorics 


we naturally try to understand F(A) = > Fix(o)= ae(a). We will compute 
F(A) using the inclusion-exclusion principle, according to which 


F(A)= So (-1)P5 4 S F(C). 


ACB BCC 


On the other hand, 


SO F(C)= X elo). 


BCC o|B=1 


The notation o|B = 1 means that o(b) = b for all b € B. Now, permutations 

such that o|B = 1 correspond to permutations of the complement of B, so 

if n — |B| > 2 we have }),)p-) €(7) = 0. We deduce that >) gcc F(C) = 0 

unless |B| > n — 1, when it is equal to 1. Combining this with the previous 

formula (inclusion-exclusion principle) yields F(A) = (—1)"~!4l(|A|] + 1 — n). 
We deduce that 


eo) < —yyn-k(r\kK+1—n 
Lit qe = 2-| 1) (o) k+1 





o k=0 
k n k l n 
Safada 
— k — k+1\k 
A ol (; + r) ak 
= -n ones 
n+ 1\k+1 
and the result follows immediately. go 


The next problem crucially uses the multiplicativity of the determinant. 
It is (rather amusingly) a consequence of the irrationality of v5. 


3. Is there in the plane a configuration of 22 circles and 22 points on their 
union (the union of their circumferences) such that any circle contains 
at least 7 points and any point belongs to at least 7 circles? 


Gabriel Dospinescu, Moldavian TST 2004 


12.1. The determinant trick 545 


Proof. The answer is negative. First, we will use the standard trick of counting 
pairs (P, {C1,C2}), where C1, C2 are distinct circles among the 22 ones and 
P is a point among the 22 ones that belongs to C1 N C2. Each point belongs 
to at least 7 circles, so for each P there are at least (5) sets {C1, C2} living 
in a triple with P. So the number of triples is at least 22 - (3). On the other 
hand, for each {C1, C2}, C1 N C2 has at most 2 elements, so there are at most 
2. (2) triples. The miracle is that we actually have 2 - (2) = 22. (5). Thus, 
all previous inequalities are actually equalities. 

Let Pi, Po,...,Po2 be the points and let Cy,Co,...,Co2 be the circles. 
Consider now the matrix A whose (i, 7)-entry is 1 if Pj € Ci and 0 otherwise. 
The previous paragraph shows that P- Pt is the matrix whose elements on 
the main diagonal are equal to 7, all other elements being equal to 2. An easy 
computation shows that the determinant of this matrix is 49 -5?!. But this 
determinant is also equal to (det A)”, which is a perfect square. We deduce 
that 5 is a perfect square, which is a bit difficult to make happen... E 


We end this section with another rather challenging problem on permu- 
tations. 


4. A permutation o of {1,2,...,n} is called k-limited if |o (i) — i| < k for 
all 1 <i < n. Prove that the number of k-limited permutations of 
{1,2,...,n} is odd if and only if n = 0,1 (mod 2k + 1). 


Putnam 2008 


Proof. Let M be the n x n matrix defined by Mij = 1);_j)<,. It is clear that 
the number of k-limited permutations has the same parity as det M. So the 
question is when M is invertible in M,(F2). From now on, we always work in 
Fj. Let us prove that ifn = 2,...,2k (mod 2k + 1), then M is not invertible. 
Pick integers 0 < a < b < k such that n +a +b = 0 (mod 2k + 1) (it is 
clear that they exist under the assumption that n = 2,...,2k (mod 2k + 1)) 
and set j = (n + a + b)/(2k + 1). If r; is the i-th row of M, then the vector 
all of whose components are 1 can be written as ar Th41—a+(2k+1)i and as 


ar Tk+1—b+(2k+1)i: Hence there is a nontrivial combination of the rows of M 
which yields the zero vector. Thus M is not invertible in this case. 


546 Chapter 12. Higher Algebra in Combinatorics 


Assume now that n = 0,1 (mod 2k +1) and that a), a2, ...,an are scalars 
such that arı +a2r2 +: -+anrn is the zero vector. Put a; = O if i ¢ {1,...,n}. 
Then am-k + @m—k+1 +°°: + @m+k = 0 for all m. By comparing the relations 
for m and m + 1, we obtain am-k = aâm+k+1 for 1 < m < n, so a; repeat 
with period 2k + 1. Taking m = 1,...,k further yields the equalities ak+2 = 
.-- = dopa, = 0. Taking m = n—k,...,n—1 gives another chain of equalities 
An—-%k = °°: = an-1-k = 0. If n = 0 (mod 2k + 1), this can be rewritten as 
a, =+- = ak = 0, whereas for n = 1 (mod 2k + 1), it can be rewritten as 
aj = --: = Qk+1 = 0. In either case, since we also have aj + --- + a2k+1 = 0, 
we deduce that all of the a; must be zero. The conclusion follows. O 


12.2 Matrices over F> 


The reduction map Z — Z/2Z induces a natural map M,(Z) > 
Mn (Z/2Z), which is easily seen to be a ring homomorphism. This map is 
very useful when dealing with applications of linear algebra in combinatorics, 
since for a lot of problems the parity gives already enough information. Also, 
the incidence matrices of families of sets are binary matrices and so are the 
adjacency matrices of graphs. To simplify notation, we let Fp = Z/2Z. 


5. 2n + 1 real numbers have the property that no matter how we eliminate 
one of them, the rest can be divided into two groups of n numbers, the 
sum of the numbers in the two groups being the same. Prove that the 
numbers are equal. 


Proof. Let x1,22,...,2Z2n41 be real numbers as in the problem and let X be 
the column vector whose coordinates are the z;’s. We can write the hypothesis 
in the form AX = 0 for some matrix A = (aij) with ai; E€ {—1,0,1}, ay = 0 
and such that the sum of the numbers in each row of the matrix (a;;) is zero. 
Of course, this linear system of equations has the trivial solution z1 = x2 = 

- = Z2n4 1 and the problem asks to prove that this is the only solution. 
Now, the dimension of the vector space of solutions is dim KerA, which is 
also 2n + 1 — rkA. Thus, it is enough to prove that A has rank 2n. Of 
course, the rank is at most 2n, since the sum of the elements in each line is 


12.2. Matrices over Fo O47 


0. But if we see A as a matrix over Fo, A becomes the matrix with 0 on 
the diagonal and 1 elsewhere. Since 2n + 1 is odd, it is easy to check that 
this matrix has rank 2n (the associated system of equations over Fə is simply 
LQ+++++ZLon41 = 0,...,21 +: -+T = 0, which clearly has the only solution 
£1 = T2 = +- =2Z9n+1). But then A must have rank at least 2n, since the rank 
can only decrease after reduction mod 2 (this is simply saying that a nonzero 
element of F» lifts to an odd, thus nonzero, integer). The result follows. O 


Proof. The classical proof is done in three steps. Let 2),...,22n+41 be the 
numbers in the problem. In the first step we assume that all xz; are integers. 
The hypothesis clearly implies that all z; have the same parity as z1 + £2 + 
-++ + Danii. Thus all z; have the same parity. Writing them either as 2y; or 
as 2y; + 1, it is clear that y1,...,Yon41 satisfy the same property as the z;’s. 
Since the sum of the absolute values of the y;’s is smaller than that of the z;’s, 
after finitely many steps deduce that all x; are equal (because if the process 
continues forever then all the z; must have been 0). 

The second step treats the case when the z;’s are rational numbers. Multi- 
plying them by a suitable positive integer N to make them integers, we reduce 
trivially this case to the one considered above. 

Finally, we consider the general case when the z;’s are real numbers. Let 


€1,€2,...,e, be a basis of the Q-vector space generated by the z;’s. Write 
each x; = ajje; + ajgeo +--+: + dikek. Since the e;’s are linearly independent, 
it follows that for each fixed J, the numbers aj,,a@2;,...,@2n41,; satisfy the 


same properties as the x;’s. Since these numbers are rational, it follows that 
they are equal by the second step. But since this holds for every 7, we have 
T1 = T2 = +++ = Lan4+1-. E 


Binary matrices are very useful when dealing with iterations of some pro- 
cess on a combinatorial structure. The fact that 2 kills Mn (F2) implies that 
for any commuting matrices A,B € M,(F2) and for any k > 1 we have 
(A + B)” = A?" + B”. This is very useful in computations. 


6. The edges of a regular 2"-gon are colored red and blue. A step consists 
in recoloring each edge whose neighbors have the same color in red and 
recoloring each edge whose neighbors have different colors in blue. Prove 


548 Chapter 12. Higher Algebra in Combinatorics 


that after 2”—! steps all of the edges will be red and that this need not 
hold after fewer steps. 


Iranian Olympiad 1998 


Proof. The easy part is the second question: if there is exactly one blue edge, 
then clearly after k < 2”~! steps the k-th edge after this blue edge is blue, so 
it is impossible to make all edges red after less than 2”~! steps. 

Let us now prove that after 2”7! steps all edges are red. Let X j be the 
vector giving the state of the edges after the j-th step, so X; is the column 
vector with 2” coordinates, the k-th coordinate being 1 if the k-th edge is 
blue after j steps and 0 otherwise. By definition we have X;j+ı = AX}, where 
everything is taken modulo 2, A = B+ B7! and By = 1if 7 =i+1 (mod 2”) 
and 0 otherwise. Since B and B~! commute, the binomial formula and the fact 
that 7". ) is even for all 1 < & < 2"-!—1 show that A?” = B?" + B-2"” 
(mod 2). However, it is very easy to compute the successive powers of B: 
a trivial induction shows that B* is simply the matrix whose entry is 1 if 
j —i =k (mod 2") and 0 otherwise. In particular, B?” = J and so by the 
previous formula A?" =0 (mod 2). But since X; = A’ Xo, it is clear that 
after 2"—! steps all edges are red. LJ 


The following problem is a classical old result, which has a very beautiful 
linear-algebraic proof. 


7. The map s : R” — R” is defined by 
S(@1,@2,...,@,) = (lai — ag], |a2 — azl, ... lar — ail). 
Prove the equivalence of the following statements: 


i) for all nonnegative integers a,,a@2,...,a,, there exists n such that 
the n-th iterate of s evaluated at (a1, a2,...,a;) is (0,0,...,0); 


ii) r is a power of 2. 


Ducci’s problem 


12.2. Matrices over Fə 549 


Proof. Suppose first that r is a power of 2. Observe that the maximal coordi- 
nate of s(a1,a2,...,ar) is clearly smaller than or equal to the maximal coor- 
dinate of (a1, a2,...,ar). Start with some nonnegative integers (a1, a@2,...,@;) 
and define Xo = (a1,Q2,...,a-) and Xn+1 = s(Xn). By the previous re- 
mark, all X,,’s live in the hypercube [0, max a;]". We will prove that the 
Xn 's become arbitrarily divisible by 2, thus one of them will have to be equal 
to 0. To prove this, it is enough to prove that one of the X,’s has all co- 
ordinates even numbers, since then we can repeat the procedure to make 
the coordinates of another X» multiples of 4 and so on. Now, note that 
if Xn = (a(n), ae(n),...,a,-(n)) then Xn+1 = (1+ T)(Xn) (mod 2), where 
T(@1,22,...,2,) = (£2, £3,..., £1). Thus X, = (1+ T)” Xo (mod 2). Now, 
since (;) is even for all 1 < k < r — 1 (because r is a power of 2), we have 


r 
(1+T)/X => > (o) T" Xo = Xo +T" Xo (mod 2). 
k=0 
But trivially T” = 1, the identity map, so Xo + T" Xo = 2X9 = 0 (mod 2). 
Thus all coordinates of X, are even and we are done. 

Assume now that i) holds. We deduce that for any vector x € F; there 
exists n such that (1 + T)”x = 0. Indeed, any such z is the reduction mod 2 
of an r-tuple of nonnegative integers (a), @2,...,a,) and we know that we can 
find n such that s"((a1,a@2,...,a,)) = (0,...,0). But we also saw that 


s"((a1,@2,...,@r)) =(1+T)"(21,22,...,2,-) (mod 2). 


Now, since there are finitely many x € F5, it follows that there exists n > 1 
such that (1+ T)"x = 0 for all x € F} (simply choose ng for each x and put 
n = max ngr). We claim that this forces r to be a power of 2. Notice that the 
minimal polynomial of T as an endomorphism of F; is X" +1 (T is simply the 
shift, so we can easily compute P(T) for any polynomial P). Thus, we must 
have X” +1|(X +1)" in F)[X]. In particular, we must have X" +1 = (X +1) 
for some j < n and so (2 ) is even for all 1 < u < j — 1, forcing j to be a power 
of 2. But then (1+ X) = 1+ X? and so r = 7 is also a power of 2. E 


Before discussing the next problems, we need some preliminaries. First, 
an easy but fundamental remark: suppose that f € Z|X1, X2,..., Xn] satisfies 


550 Chapter 12. Higher Algebra in Combinatorics 


f(£1,£2,...,2n) = 0 for all integers z;. Then an easy induction on n shows 
that all coefficients of f are zero. Thus, for all fields K and all z1, £2,..., 2n € 
K we have f(£1,£2,...,£n) = 0 € K. Let us apply this observation to 
establish the following 


Theorem 12.2. Let K be a field and let A E€ Mn(K) be such that aij +aji = 0 
for alli,j. If n is odd and if a;;j =0 for alli, then det(A) = 0. 


Proof. Consider the polynomial 


0 a12 Q13 `° Qin 
—a@a2 0 A23 °°: An 

f(a12, @13,---;@nn-1) =| 7213 7423 O >> anje Z|a12,413,---,@nn-1]. 
—QAjin —942n —93n t: 0 


By the previous discussion, it suffices to prove that f vanishes when eval- 
uated at any ai; E€ Z. But if X € M,(Z) is antisymmetric, then 


det(X) = det(X‘) = det(—X) = (—1)" det(X) = — det(X), 


the first equality being standard, the second one the hypothesis and the last 
one a consequence of the fact that n is odd. Thus det(X) = 0 for all an- 
tisymmetric matrices X € M,(Z). But this is precisely what we wanted to 
prove. LJ 


We consider next two applications of the previous theorem. 


8. Suppose that n teams compete in a tournament wherein each team plays 
against any other team exactly once. In each game, 2 points are given 
to the winner, 1 point for a draw, and 0 points for the loser. It is known 
that for any subset S of teams, one can find a team (possibly in S) whose 
total score in the games with teams in S is odd. Prove that n is even. 


D. Karpov, Russian Olympiad 1972 


12.2. Matrices over Fo ool 


Proof. Define the matrix A = (a;;) by aij = 1 if there is a draw between 2, j 
and 0 otherwise. Clearly, ai = 0 and aj; = aji for all i,j. See A as a matrix 
with coefficients in Fə. We claim that A is invertible. Indeed, otherwise we 
can find z € F} nonzero such that Az = 0. If J = {1 <i < n|z; = 1}, then 
the condition Az = 0 can be written Die , aij = 0 for all 1 < i < n. Since 
each match which is not a draw yields an even number of points, the above 
condition implies that for all i, the total score in the games between i and 
the teams in J is even, contradicting the hypothesis. Now, it is enough to use 
theorem 12.2 to conclude that n is even. LJ 


The following problem is essentially the same as the previous one, but 
written in different language. 


9. A simple graph has the property: given any nonempty set H of its 
vertices, there is a vertex x of the graph such that the number of edges 
connecting x with the points in H is odd. Prove that the graph has an 
even number of vertices. 


Proof. If A is the adjacency matrix of the graph, seen as matrix over Fo, 
then the hypothesis of the problem translates into: for all nonzero z € F5 we 
have Az Æ 0. Thus A is invertible over F2. Since A is clearly antisymmetric 
(because it is symmetric and we are in characteristic two) with zero diagonal, 
the result follows from theorem 12.2. LJ 


The following problem is rather easy. 


10. Let Aj, Ao,...,An be subsets of A = {1,2,...,n} such that for any 
nonempty subset T of A, there is i € A such that |A; N T| is odd. 
Suppose that B1, Bo are subsets of A such that |A; N By| = |A;N Bo| = 1 
for all i. Prove that Bı = Bo. 


Gabriel Dospinescu, Mathematical Reflections 


Proof. Let’s consider the subsets of A as vectors in F}, where the entry in the 
ith position is 1 if the subset contains the element, 0 otherwise. Let the vectors 
corresponding to A;’s form the rows of the matrix M. The first condition can 


552 Chapter 12. Higher Algebra in Combinatorics 


be expressed as Mv Æ 0 for all v # 0. Let v be the vector all of whose 
components are 1. The second condition says that Mb; = v for all 7, where 
b; are the vectors representing the B;’s. So we have M (bı — b2) = 0, whence 
by = bo and Bı = Bə. LJ 


12.3 Applications of bilinear algebra 


In problems concerning incidences, it is sometimes useful to consider 
properties of inner products and bilinear algebra. If K is any field, we 
define the inner product of two vectors x = (£1,%2,...,Zn) E€ K” and 


y = (Y1, Y2,- --; Yn) € K” as 
(£, Y) = T1Y1 + Layo +`: + EnYn- 


If K = R, we let ||x|| = y (x, x) and call it the norm of x. The Cauchy-Schwarz 
inequality is equivalent to the triangle inequality || + y|| < |læ|| + Ilyll. 

Let us start with a rather standard problem, which is nearly optimal. As 
the remark following the proof of it shows, this problem is actually rather 
subtle. We will see some variants of it later in the chapter. 


11. A handbook classifies plants by 100 attributes (each plant either has a 
given attribute or does not have it). Two plants are dissimilar if they 
differ in at least 51 attributes. Show that the handbook cannot give 51 
plants all dissimilar from each other. 


Tournament of the Towns 1993 


Proof. More generally, assume that the handbook considers 2n attributes, that 
two plants are dissimilar if they differ in at least n + 1 of these, and that we 
have n+ 1 plants all dissimilar from each other. For each 2, let vj be the vector 
whose j-th coordinate is 1 if plant 2 has attribute 7 and —1 otherwise. Let (,) 
be the standard inner product and observe that the hypothesis implies that 
(vi, vj} < 0 for alli # j. It is however clear that (v;,v;) is an even integer, 
thus we actually have (v;,v;) < —2 for alli Æ j. But then 


n+l n+1 
Soil? = So (an) +25 wiv) < 2n(n + 1) -4("5 D ~0. 
i= 1 


i=1 i<j 


12.3. Applications of bilinear algebra 003 


This implies that vı +v9+---+Un4q1 = 0. If n is odd, then we are quite stuck, 
but if n is even, then last equality certainly cannot hold, as each component 
of vi + vg +--+ + Un41 is an odd number. Fortunately, n = 50 is even in our 
problem and so we are done. LJ 


Proof. Let us assume that the book has 51 pairwise dissimilar plants, call them 
Q1,@2,...,@5, and call the attributes Pi, Po,...,Pio9. We will count triples 
(ai, aj, Pk) with i < j so that a; and a; differ on attribute Pk. On the one 
hand, for each (a;,a;) there are at least 51 triples containing this pair, so the 
total number of triples is at least 51%). On the other hand, if A, is the set 
of plants having attribute Pk, then there are |A,|(51 — |Ax|) < 25 - 26 pairs 
(a;,a;) living in triples with attribute Px. We deduce that 


ol 
100 - 25-26 > öl. (Z) <=> 2600 > 2601, 


a contradiction. LJ 


Remark 12.3. A Hadamard matrix is a square matrix with entries +1 such 
that the rows are pairwise orthogonal vectors. It is a famous conjecture that 
for all n € 4N there exists an n x n Hadamard matrix. Assume that n + 1 
is even and suppose that the previous conjecture holds. Thus we can find 
a Hadamard matrix of order 2n + 2. By multiplying some rows by —1, we 
may assume that all entries of the first column are 1. Since the columns are 
orthogonal, each column except the first now has n+ 1 positive and n+ 1 
negative terms. Now consider all the rows in which the first two terms are 
+1. There are n + 1 such rows. Take these rows and chop away the first two 
columns. The result is n+ 1 vectors of length 2n such that the inner product 
of every pair of vectors is —2. Interpreting the rows as plants and the columns 
as attributes, we get the required set of n + 1 plants which pairwise differ 
in strictly more than half of their 2n attributes (in fact, every pair differs in 
exactly n + 1 attributes). One can actually exhibit Hadamard matrices of 
order 104, yielding 52 plants which pairwise differ in 52 out of 102 attributes. 


The next two problems are variations on the following nice and rather 
classical result. We will also present purely combinatorial proofs and we will 
leave the reader to decide which method is more efficient. 


554 Chapter 12. Higher Algebra in Combinatorics 


Theorem 12.4. Let vi, v2,...,Uk be nonzero vectors in R” such that 
(vi, vj) < 0 
for alli # j. Then k < 2n. 


Proof. We will induct on n. For n = 1, it is an obvious application of the pi- 
geonhole principle. Now, suppose that the result holds for n — 1 and consider 
V1, U2,..., Uk # 0 in R” such that (vi, vj} < 0 for all i 4 j. Dividing each v; by 
its norm and working with the new set of vectors, we may assume that v; have 
norm 1. Choose an isometry A sending vı to the vector e1, whose first coordi- 
nate is 1 and all the others are 0. Working with the vectors Av, Avog,..., Avy 
instead of v1, 2,..., Uk (they satisfy the same hypothesis, as an isometry is in- 
vertible and preserves the inner product), we may assume that A is the identity 
and so vı = e1. Then the first coordinate of each of the vectors v2, U3,..., Uz 
is non-positive by hypothesis. Consider the vectors w2,w3,...,w, E€ R"! 
obtained by deleting the first coordinate of v2, v3,...,U%. They are the pro- 
jections of v2, v3,...,U,% on the orthogonal complement to e;. Since the first 
coordinate of each of v2,v3,...,Ug is non-positive and since (v;, vj} < 0 for 
2 <i Æj < k, we also have (wi,w;) < 0 for alli # j. If wo,ws3,..., we are 
nonzero we obtain k < 2n — 1 < 2n by the inductive hypothesis. So, assume 
that some w;’s are zero. We claim that at most one of them is zero. If we 
manage to prove this, it will follow that k — 2 < 2(n — 1) and the inductive 
step will be proved. So, assume that w = w3 = 0, then v1, v2, v3 are multiples 
of e and since they are norm 1, each of them is equal to e; or —e,. But then 
we can find 1 <i < j < 3 such that vi = vj. As (vi, vj} < 0, it follows that 
v; = 0, a contradiction. The result follows. O 


12. An m x n matrix is filled with 0 and 1 such that any two rows differ in 
at least n/2 positions. Prove that m < 2n. 


Iranian Olympiad 
Proof. It is more convenient to replace all zeros in the table by —1. Of course, 


this does not change anything to the hypothesis or conclusion of the problem. 
See each row as a vector in R” and let v1, v2,...,Um be these vectors. For any 


12.3. Applications of bilinear algebra 555 


i Æ Jj, the inner product (v;,v;) is simply the difference between the number of 
positions where v; and v; coincide and the number of positions where the two 
vectors differ. Taking into account the hypothesis, we deduce that (v;,v;) < 0 
for alli Æ j. Of course, v; Æ 0 for all i. The result follows from theorem 
12.4. E 


Proof. The crucial ingredient in the proof is the following 


Lemma 12.5. Ifin anm x n binary matrix every two rows differ in at least 


d positions and if 2d > n, then m < ate. 





Proof. Call a pair (x,y) of matrix entries good if z and y are in the same 
column and if (x,y) = (0,1). If d; is the number of ones in the i-th column, 
then clearly there are = d;(m — di) good pairs. Since dj(m — dj) < me 
there are at most n.*7 good pairs. However, if we fix two rows, we find at 
least d good pairs, by hypothesis. Thus we find at least d(y) good pairs and 


so d(7) <n- mË, which immediately yields the lemma. o 


Passing now to the proof, let A; be the number of rows whose first element 
is 1 and let A> be the number of rows whose first element is 0. If we consider 
only the rows whose first element is 1 and then delete the first element in 
each such row, we obtain a set of rows such that any two differ in at least 5 
positions. By the previous lemma, we deduce that A; < n and Ao < n. Since 
m = A; + Ag, the result follows. O 


13. Let n be a positive integer. If u = (u;)?2, and v = (vj)2_, are binary 
sequences of length 2”, let the distance between them be 


Qn 
d(u,v) = `S [ui — vil. 
i=1 


Find the greatest number of binary sequences of length 2” whose pairwise 
distances are at least 2"”71. 


China TST 2005 


506 Chapter 12. Higher Algebra in Combinatorics 


Proof. Replacing each coordinate 0 by —1 in each vector, we must find the 
maximal number of vectors of length 2” with coordinates +1 and whose pair- 
wise inner products are nonpositive. Theorem 12.4 shows that this maximal 
number is at most 2"*!. The problem is thus solved if we can exhibit 2"+! vec- 
tors with coordinates +1 in R?” and with nonpositive inner products. We will 
consider the vectors +e;, te2,...,+e9n, where €1,€2,...,€gn is an orthogonal 
basis of R?” and all e; have coordinates +1. Such a basis can be constructed 
by induction: for n = 1 choose e; = (1,1) and e2 = (1,—1). Assuming that 
we have such a basis for n, consider the vectors f; = (e;,e;) for 1 < i < 2” 
(where (x,y) is the vector obtained by considering the coordinates of x fol- 
lowed by the coordinates of y) and fi = (—ej;—-2n, e;-9n) for 2” +1 <i < 2"*!, 
We can immediately check that these vectors are linearly independent and by 
construction they are orthogonal. The inductive step is proved. LJ 


Remark 12.6. A more general question is the following: given n and k, what is 
the maximal number of binary sequences of length n such that any two differ 
in at least k places? For some nontrivial cases of this problem, we refer the 
reader to chapter 10 of [7]. 


Here is another very classical and nontrivial result. There are purely 
combinatorial proofs, but they are really not very illuminating. On the other 
hand, the algebraic proof is very neat. 


14. Let Aj, Ao,...,Am be distinct subsets of a set A with n > 2 elements. 
Suppose that any two of these subsets have exactly one common element. 
Prove that m <n. 


Fisher’s inequality 


Proof. We may assume that A = {1,2,...,n}. Following the usual technique 
for this kind of problem, consider the incidence matrix M of the sets, that is 
the m xn matrix defined by m;,; = 1 if 7 E€ A; and 0 otherwise. By hypothesis, 
M - M‘ is the matrix with i-th row (1,1,...,|/Aj;|,1,...,1). 

For technical reasons, we have to discuss two cases. The first case is when 
|A;| = 1 for some 7. But then by assumption all A; have to contain A; and so 
A; \ A; for 7 #7 are pairwise disjoint nonempty subsets of a set with n — 1 


12.3. Applications of bilinear algebra 557 


elements, thus certainly m — 1 < n — 1 and we are done. The hard case is 
when all |A;| > 2 and the key point (to be proved below) is that in this case 
M - Mt is invertible, thus of rank m. Since the rank of M - Mt is at most that 
of M, which is at most n, this will be enough to prove the result. 

In order to prove the key point, we will argue by contradiction: if 
M - MŻ is not invertible, there exists a nonzero vector with real coordinates 
T1, £2,...,£m such that M - Mtz = 0. But then ‘cM - Mtz = 0 and so 


m 
`S | Aj |x? +2 `S Liki = 0. 
i=1 


1<i<j<m 


This can be also written as 


m m 2 
S C(4:| — 1)2? + ($a) = 0 


Since |A;| > 2 for all 2, the last relation obviously implies that all x; are zero, 
a contradiction. Thus, the claim is proved and the problem is solved. LJ 


Remark 12.7. One can also prove the invertibility of M - Mt in a different way, 
if |A;| > 2 for all i. Namely, if M - Mtz = 0, then (|A;| — 1)2; + S = 0, where 
S=2,+%.+-:-+2,. But then z; = TA and so 


8=-8- a 


which obviously implies that S = 0 and so all z; are zero. 


Remark 12.8. We leave as an exercise to the reader to adapt the previous proof 
and deduce the following more general result: let A1, Ag,...,Am be different 
subsets of a set with n elements X. If there is a € {1,2,...,n— 1} such that 
[Ai N A;| =a for all i # j, then m <n. 


It is also very hard to solve the following problem without linear algebra. 
On the other hand, even with the help of (bi)linear algebra, this problem is 
quite tricky. 


558 Chapter 12. Higher Algebra in Combinatorics 


15. Let Aj, A2,..., Am and Bı, B2,..., Bp be families of distinct subsets of 
{1,2,...,n} such that A; N B; is an odd number for all 2 and j. Then 
mp < gn=l, 

Benny Sudakov 
Proof. Let v;i be the incidence vector of A; and let w; be the incidence vector 
of B;, seen as vectors in F5. The hypothesis becomes: (vi, w;) = 1 for all ¢, j, 
where (,) is the standard inner product on F5. Let V be the vector space 


spanned by the vectors w;. Since the vectors vi + vı are m distinct vectors 
each orthogonal to every w;, thus to V, we have 


m < odim Span(vit+v1) < gn—dim Span(w;) 


Moreover, since (vı, wj} = 1 and (v1, we + w1) = 0 for all j, k, the sets 
{w;|1 <i < p} and {wj + wi|1 <i <p} 
are disjoint subsets of Span(w;). Thus 2p < 24m Span(w:). Multiplying these 


two inequalities yields the desired result. LJ 


The next three problems also require a preliminary discussion. Namely, 
we will recall the proof of the following classical, but nontrivial result: 


Theorem 12.9. Let A € M,(Fo) be a symmetric matrix and let d be the 
column vector whose coordinates are the entries on the main diagonal of A. 
Then there exists x € F5 such that Ax = d. 


The easiest proof uses bilinear algebra: consider the standard inner prod- 
uct on F5 and observe that it suffices to prove that any vector which is or- 
thogonal to Im(A) is also orthogonal to d. But if v is such a vector, we have 


n n 


` X aijw; vi = 0 


i=1 \j=1 
for all w; € Fz and by exchanging the two sums we deduce that 


n 


`S QijUi = 0 


i=1 


12.3. Applications of bilinear algebra 559 


for all 7. But then we also have 


j=l i=1 


which can be written (taking into account that aij + aji = 0) 


) aiv? =Q. 


i 
As v? = v;, we are done. 


16. Light bulbs Li, L2,...,Ln are controlled by switches $},59,...,Sn. 
Switch S; changes the on/off status of light L; and possibly the sta- 
tus of some other lights. Suppose that if S; changes the status of L; 
then S; changes the status of L;. Initially all lights are off. Is it possible 
to operate the switches in such a way that all the lights are on? 


Uri Peled, AMM 10197 


Proof. This is a very easy application of the previous result: define aj; = 1 if 
S; changes the status of L; and 0 otherwise. By hypothesis, we have aij = aji 
and aj; = 1. Thus the matrix A = (a;;), seen as matrix over Fo, is symmetric 
with diagonal elements 1. By the previous theorem, we can find a nonzero 
vector x € F9 such that Az is the diagonal vector of A. Thus, ) `j- @ijtj = 1 
for all z. If J is the set of those j with z; = 1, the previous equality tells us 
that if we operate the switches S; with j € J, then the state of each L; will 
change an odd number of times and so all lights L; will be on. LJ 


The following problem is a bit trickier. We also give a purely combinatorial 
proof, to see the difference... 


17. Let G be a graph. Prove that the set of its vertices can be partitioned in 
two groups (possibly empty) such that each group induces a subgraph 
in which all vertices have even degree. 


Gallai’s Cycle-Cocycle partition theorem 


560 Chapter 12. Higher Algebra in Combinatorics 


Proof. Consider the adjacency matrix A of the graph and perform the follow- 
ing operations on it: for each vertex i of odd degree, add a 1 in position (2,2) 
of A. For all the other vertices i, leave position (2,7) as in the original matrix 
(i.e. equal to 0). We thus get a new symmetric binary matrix B = (bij). By 
the previous theorem, there exists x € F5 such that 


n 
) bij 25 = bii 
i=l 


for all i. Let V; = {1 <i < nz; = 1} and let Vo be its complement. We claim 
that this partition of the vertices satisfies the desired conditions. Indeed, for 
i € Vi we have } jev- fi} bij = 0, so i has an even degree in the subgraph 
induced by Vi. For i € V2, we have 


`S b= by- X by = Do bj 
j=l 


JEV2- {i} j=1 jEViU{i} 


and the last quantity is 0 by construction. Thus ¿ has even degree in V2 and 
we are done. LJ 


We continue with a nice variation on the previous result. 


18. At a certain mathematical conference, every pair of mathematicians are 
either friends or strangers. At mealtime, every participant eats in one 
of two large dining rooms. Each mathematician insists upon eating in a 
room which contains an even number of his or her friends. Prove that 
the number of ways that the mathematicians may be split between the 
two rooms is a power of two . 


USAMO 2008 


Proof. Note that this problem implies the existence of at least one admissible 
partition (i.e. one that satisfies the conditions of the problem), which is pre- 
cisely the content of the previous problem. Fortunately, the hard work was 
actually done. Let G be the graph whose vertices are the mathematicians, 
two vertices being connected if the mathematicians are friends. Suppose that 


12.4. Matrix equations 061 


G has n vertices 1,2,...,n and consider the matrix B as in the solution of 
the previous problem. We claim that there is a natural bijection between ad- 
missible partitions and solutions of the system Bax = d, where d is the main 
diagonal of B. Indeed, let us identify admissible partitions Co, Cı with vec- 
tors x € FZ, where z; = 0 if i € Co and qz; = 1 if i € Ci. The admissibility 
condition can be written: for all 7 we have 


` 1=0 (mod 2), 


where C (i) is the class of the partition containing i. Just as in the proof of 
the previous problem, this condition can also be written 


n 
) b55 2; = bii. 
j=1 


Thus, the number of admissible partitions is the number of vectors z € F5 
such that Bx = d. We know that there is at least one such vector zo (by the 
discussion preceding problem 10 or by the previous problem). But then 


{r|Ar = d} = zo + {xz|Ax = 0} 
and {z|Ar = 0} = Ker(A) is a linear subspace of F}, so it has cardinality 


gdim Ker(A) Since this is a power of 2, the conclusion follows. LJ 


12.4 Matrix equations 


To fully appreciate the power of linear algebra in combinatorics, the reader 
should try to find a purely combinatorial solution to the following problem. 


19. Let G,,Go,...,G, be complete bipartite subgraphs of the complete 
graph Ko, with 2n vertices. Assume that every edge of Ko, is contained 
in an odd number of subgraphs G),Go,...,G,. Prove that k >n. 


Proof. Let 1,2,...,2n be the vertices of Ko, and let L;, Ri be disjoint subsets 
of {1,2,...,2n} such that the set of vertices of G; is L; U R; and two vertices 


062 Chapter 12. Higher Algebra in Combinatorics 


are connected if and only if one is in L; and the other is in R;. If l; € F$” is the 
vector whose nonzero coordinates are precisely those on positions belonging to 
L; and if r; is similarly defined, then the entry in position (u, v) of the matrix 
li - rt is precisely luer,,veRr,;. We deduce that the entry in position (u,v) of 


k 
X=Y (lri + ri) E€ Mon (Fo) 

i=l 
is 0 if u = v and it is the number of subgraphs containing the edge uv, 
taken mod 2, otherwise. Therefore, the hypothesis can be reformulated as an 
equality of matrices X = J — Ign, where J is the matrix all of whose entries 
are 1. Now, matrices of the form v; - vs are precisely the matrices of rank at 
most 1. So, we expressed X as a sum of 2k matrices of rank at most 1. Since 
the rank function is sub-additive (thought of in terms of endomorphisms, this 
is obvious: the image of u + v is contained in the sum of the images of u and 
v for any endomorphisms u, v of a vector space), in order to solve the problem 
it is enough to prove that J — Jo, has rank 2n, i.e. that it is invertible. This is 
however trivial: if Jz = x for some vector x, then z; = £1 + £2 +--+ Tən for 
all 7, so first all z;’s are equal and then z; = 2nz; = 0, since we are working 
in characteristic 2. The result follows. o 


We also discussed the following beautiful problem in example 7, chapter 
23 of [3]. We give here a very natural solution using the following classical 
result: 


Theorem 12.10. Let X € M,(R) and Y € Mm(R). There exists a nonzero 
matriz A E Mnxm(R) such that X A = AY if and only if X,Y have a common 
eigenvalue. 


Proof. Assume first that there exists a nonzero matrix A € Mnxm(R) such 
that XA = AY, but X,Y have no common eigenvalue. Thus the characteristic 
polynomials P,Q of X,Y are relatively prime and so there exist R, S € C[T| 
such that PR + QS = 1. Now, since XA = AY, we have by the Cayley- 
Hamilton theorem 0 = P(X)A = AP(Y). But we also have AQ(Y) = 0, as 
Q(Y) = 0 (again by Cayley-Hamilton). Thus 


A= A(PR)(Y) + A(QS)(Y) = AP(Y)R(Y) + AQ(Y)S(Y) = 0, 


12.4. Matrix equations 063 


a contradiction. ‘This settles the first part. Next, if X,Y have a common 
eigenvalue A, then X and Yt also have the common eigenvalue À and so one 
can find nonzero vectors v, w such that Xv = Av, Y'w = Aw. It is then easy to 
check that A = vut is a nonzero complex matrix such that XA = AY. Then 
one of Re(A) and Im(A) is a nonzero real matrix satisfying again XA = AY 
and we are done. LJ 


20. A grid divides an n x m sheet of paper into unit squares. The two sides of 
length n are taped together to form a cylinder. Prove that it is possible 
to write a real number in each square, not all zero, so that each number 
is the sum of the numbers in the neighboring squares, if and only if there 
exist integers k,l such that n+ 1 does not divide k and 


kr 1 


n+l 2° 





2l 
cos —— + Cos 
m 


Ciprian Manolescu, Romanian TST 1998 


Proof. See the grid as an n x m matrix A and define ag; = @n41,; = 0 for all 
1. The problem becomes: there exists a nonzero matrix A such that 


Qij = Qi—1,j t Qi+1,j F Qi j-1 + aij+ 


(for all 1 < i < n, j being taken mod m) if and only if there exist integers 
k,l as in the statement. The previous equality can be written very simply 
by introducing the matrices B € M,(Z), C € Mm(Z) defined by bi; = 1 if 
li — j| = 1 and 0 otherwise, cj; = 1 if |i — j| = 1 or (i,j) € {(1, m), (m, 1)}. 
Indeed, the previous relation is equivalent to AC+BA = A, or BA = A(I-C). 
By theorem 12.10, the existence of the matrix A is equivalent to the existence 
of a common eigenvalue of B and J — C. This is equivalent to the existence of 
eigenvalues A;, A2 of B, respectively C such that A; + Ag = 1. Let us compute 
the eigenvalues of C. If Cx = Ax for a nonzero vector z, we can write 


T2 + Im = ÀT], T1 + T3 = ÀAT2, ..., L1 +Tm-1 = Àm 
and so if we extend zx; by m-periodicity, we can simply write this as 


Ti+2 + Ti = ÀTi+1. 


064 Chapter 12. Higher Algebra in Combinatorics 


If r1, ro are the roots of the equation t? — At + 1 = 0, then zi = Cir} + Cori, 
(Ciri +Cyir’ if rı = r2). Imposing the condition of m-periodicity, we see that 
we should have r? = r? = 1 and (because rı and rz are reciprocal roots of 
unity) 
_ 2lr 

A = ri +72 = r1 +71 = 2Re(r1) = 2 cos — 
for some 0 < l < m. This suggests the form of the eigenvalues of C. To make 
this argument completely rigorous, it suffices to reverse the computations: 
simply choose any 0 < l < m, set A = 2 cos an ,T1 =e “mand choose r=. 
Then z is an eigenvector with eigenvalue A. "Since we have found the correct 
number of eigenvectors, we are done: the eigenvalues of C are 2 cos = ain (the 
argument was a bit long, but we wanted to present it in this form as it is 
rather useful). The same kind of reasoning shows that the eigenvalues of B 
are 2cos “= and the result follows from these computations and the previous 


n+l 
discussion. 


We give two proofs for the following beautiful and classical result. For 
the first proof we follow [29], while the second proof appears in [42]. 


21. In a society, acquaintance is mutual and any two persons have exactly 
one common friend. Then there is a person who knows all the others. 


The friendship theorem, Erdos, Rényi, Sós 


Proof. First, we will prove that either there is a universal friend (i.e. a person 
knowing all the others) or any two persons have the same number of friends. 
If A and B are not friends, let aj, a9,...,a, be A’s friends and let by, be,..., 
be B’s friends. By assumption, a; and B have a unique common friend 6 ;;) 
and bj, A have a unique common friend a,;;). Since f and g are clearly inverses 
to each other we have k = l and we are Tone. Consider the obvious graph of 
enemy relationships in this society and suppose that it is disconnected. Then 
the society can be partitioned into two sets X and Y such that everyone in set 
X is friends with everyone in set Y. If both X and Y have two people or more, 
then two people in X have at least two common friends in Y, contradiction. 
On the other hand, if either set contains only one person then that person is 
a universal friend. Thus, if the enemy graph is disconnected, there must be 


12.4. Matrix equations 060 


a universal friend. On the other hand, if the enemy graph is connected, then 
every person has the same number of friends, because we saw that two enemies 
have the same number of friends. This proves the claim and also shows that 
we may assume that the enemy graph is connected in what follows. 

Let d be the number of friends of each person and let n be the number 
of people. Then, the number of pairs of people with a specific person as their 


common friend is (5), so by counting the number of pairs of people in two 


ways we get (5) = n(5) and son = d? — d + 1. Finally, let A be the adjacency 


matrix of the graph. By our previous work, we can write A? = (d—1)I+ J, 
where J is the matrix having 1 at each entry. Trivially, (d — 1) + J has 
eigenvalues d? and d — 1 (with multiplicities 1 and n — 1 respectively). Thus 
the eigenvalues of A are d and +Vd— 1. Since the trace of A is zero, we 
deduce that d + (a — b)vd — 1 = 0, where a,b are the multiplicities of the 
eigenvalues +/d-— 1. Squaring the last relation, we deduce that d— 1 divides 
d?, so that d = 1 or d = 2. But both such cases trivially satisfy the theorem, 
which gives the contradiction we needed. LJ 


Proof. As in the previous solution, we prove that either there is a universal 
friend or the associated graph G is d-regular and connected, for some d such 
that n = d? — d + 1. Now comes the beautiful idea: for a positive integer p, 
let c(p) be the number of cycles of people of length p such that each person in 
the cycle is friends with the next person (people can appear multiple times in 
such a cycle). Any such cycle can be constructed in the following way: pick a 
starting person, pick one of his friends, pick a friend of that friend, etc., until 
you are at the second to last person in the cycle. If the second to last person 
is different from the first person, then the last person must be their common 
friend, but if the second to last person is the same as the first then the last 
person can be any of the first person’s d friends. Thus 


c(p) = nd?~? + (d — 1)c(p — 2). 


If d £0 and d Æ 2, pick a prime divisor p of d— 1. Then, since n = d*-—d+1, 
we have n = 1 (mod p) and so c(p) = 1 (mod p). On the other hand, c(p) 
is a multiple of p: there are no cycles that are shifts of themselves (since no 
person is a friend of himself or herself...), so the number of cycles of length 


066 Chapter 12. Higher Algebra in Combinatorics 


p must be a multiple of p. The last two results are not possible at the same 
time, so that we must have d = 0 or d = 2. But in this case it is clear that 
there is a universal friend, which finishes the proof. LJ 


Remark 12.11. For infinitely many persons, this result does not hold anymore. 


Remark 12.12. The theorem actually classifies all graphs as in the statement: 
all of them consist of a collection of triangles sharing a vertex. 


Remark 12.13. In [42] one can also find a combination of the first two solutions, 
which is very elegant. Namely, first we prove that the graph is d-regular for 
some d such that n = d? — d+ 1 and that the adjacency matrix A of the graph 
satisfies A? = (d—1)I + J, where J is the matrix all of whose entries are 1. 
Suppose that d > 3 and pick a prime p dividing d — 1. Working in M,,(F,), 
we obtain A? = J and since AJ = dJ, we have AJ = J. Then A? = J and 
so tr(A?) = n (mod p). On the other hand, an extension of Fermat’s little 
-theorem yields tr(A?) = (trA)? = trA (mod p) (working with the eigenvalues 
of A this reduces easily to theorem 9.15). But clearly trA = 0. We deduce 
that p divides n, which is clearly impossible. Thus d < 2. 


We end this section with another classical result in algebraic combina- 
torics. 


22. In a graph G with n? + 1 vertices suppose that every vertex has degree 
n and every cycle has length at least 5. Then n € {1, 2,3,7,57}. 


Hoffman-Singleton’s theorem 


Proof. The combinatorial part of the theorem is contained in 


Lemma 12.14. If x,y are vertices of G, the number of common neighbors of 
x,y is 0 if x,y are connected and 1 otherwise. 


Proof. We will count triples (z,(y,z)) in which z,y,z are vertices and z 
is connected to both y and z. Counting them according to xz, we obtain 
(n? + 1)(5) triples (since there are n? + 1 vertices, each of degree n). On the 
other hand, two connected vertices cannot belong to a triple (otherwise we 
would obtain a 3-cycle in G) and two non-connected vertices y, z can belong 
to at most one triple (otherwise G would have a 4-cycle). So the number 


12.4. Matrix equations 567 


of triples is at most (nt) minus the number of edges of the graph, so at 


2 

2 
most (nt 1) — nae = (n? + 1)(5). Since we have already established that 
there are precisely (n? + 1) (5) triples, it follows that every two non-connected 


vertices belong to exactly one triple and the lemma is proved. LJ 


Let A be the incidence matrix of G and let J be the matrix all of whose 
entries are 1. The previous lemma yields 


A*=nl+J—-I-Az=(n—-1)I-A4 J. 


Next, A is a symmetric matrix with real entries and zero diagonal, so it has 
zero trace and it is diagonalizable in an orthonormal basis (by the spectral 
theorem). In particular, all eigenvalues of A are real and the eigenvectors 
corresponding to different eigenvalues are orthogonal. Since the sum of entries 
in each row is n (because G is n-regular), we have Avı = nv, where vı is 
the vector all of whose coordinates are 1. So, if Av = Av for some v Æ 0 and 
À Æ n, then (v,,v) = 0 and so Jv = 0. But this implies that à? =n — 1 — À. 
So, if r1,r2 are the roots of the equation z? + z = n — 1, any eigenvalue of A 
different from n is rı or r2. 

Next, we claim that n has multiplicity 1 as an eigenvalue of A. It suffices 
to prove that if Av = nv, then v is collinear to vı. This is easy, since Av = nu 
implies nĉ?v = A?v = (n — 1)v — Av + Jv and so Ju = (n? + 1)v, hence all 
coordinates of v are equal and v is collinear to vı. Let a,b be the multiplicities 
of r1, T2 as eigenvalues of A. So a+b = n? by the result we have just established 
and arı + br2 +n = 0, as A has trace 0. Since 


—-1+//4n-3 —1-— /4n—-3 


T2 = — 5 


m= 2 2 


the previous relation becomes 





— b 
-4H y ynm +n=0. 


Now, we have two cases: if v4n — 3 is irrational, the last relation implies that 
a = b and n = a. Since a +b = n?, this gives n = 2 and we are done. If 


068 Chapter 12. Higher Algebra in Combinatorics 


2 
k = vån — 3 is rational, write a + b = n? = (51) and replace this in the 


previous relation. We obtain 
16k(a — b) — (k? +3)? + 8(k? + 3) = 0. 


Reducing this mod k, we finally obtain k|15 and since n = t3 the result 
follows. go 


Remark 12.15. The proof of the lemma actually yields the following result: 
if G is a graph in which any cycle has length at least 5 and any vertex has 
degree at least n, then G has at least n? + 1 vertices. So the theorem proved 
above actually classifies the extremal case. At the time of this writing, it is 
unknown whether or not there exists a graph as in the theorem for n = 57 
(for all the other values of n, such graphs exist). 


12.5 The linear independence trick 


23. In an m x n table, real numbers are written such that for any two rows 
and any two columns, the sum of the numbers situated in opposite cor- 
ners of the rectangle formed by them is equal to the sum of the numbers 
situated in the other two opposite corners. Some of the numbers are 
erased, but the remaining ones allow us to find the erased numbers us- 
ing the above property. Prove that at least n +m — 1 numbers remained 
on the table. 


Russian Olympiad 1971 


Proof. Consider the set X of all m x n matrices with real entries such that for 
any two rows and columns, the sum of the numbers situated in the opposite 
corners of the rectangle formed by them is equal to the sum of the numbers 
situated in the other two opposite corners. This set is obviously a linear 
subspace of Mmxn(R). If S is the set of pairs (2,7) such that the number 
situated at the entry (i,j) has not been erased, then the hypothesis can be 
translated into: the map f : X > RISI obtained by sending a matrix A € X 
to the collection (@;,;)(:,;)¢g is injective. But then the dimension of X has to 


12.5. The linear independence trick 069 


be at most the dimension of RI'I. Thus, |S] is at least equal to the dimension 
of X as R-vector space. Therefore it is enough to exhibit m + n — 1 linearly 
independent matrices in X. This is easy: simply consider the matrices A; 
having the j-th row equal to the vector (1,1,...,1) and the other rows zero 
(and this for all 1 < 7 < m), then the matrices B; having the i-th column 
consisting of ones and the remaining entries 0 (and this for all 1 <i < n — 1) 
It is immediate to check that they are in X and linearly independent. LJ 


Once the necessary mathematical translations are done, the proof of the 
following problem is shorter than its statement... 


24. In a contest consisting of n problems, the jury defines the difficulty of 
each problem by assigning it a positive integral number of points. Any 
participant who answers the problem correctly receives that number of 
points for the problem; any other participant receives 0 points. After the 
participants submitted their answers in such a contest, the jury realized 
that given any ordering of the participants, it could have defined the 
problems’ difficulty levels to make that ordering coincide with the par- 
ticipants’ ranking according to their total scores. Determine, in terms 
of n, the maximum number of participants for which such a scenario 
could occur. 


Russian Olympiad 2001 


Proof. It is clear that we may have n participants: impose that participant 7 
solves problem 7 and nothing else. We will prove that we cannot have more 
than n participants. By associating to participant j the vector x; consisting 
of the number of points he received for each problem, the hypothesis implies 
that x; € RẸ} and for every ordering of the x;’s, there is a vector y € RẸ} which 
gives that ordering of (z;, y). On the other hand, if the number of participants 
m is greater than n, there is a nontrivial linear combination of the z;’s that 
vanishes. By separating the coefficients of such a combination according to 
their sign, we find an equality $; aizi = $; jx; for some disjoint subsets 


‘The same number of points may be assigned to different problems. 
“Ties are not permitted. 


570 Chapter 12. Higher Algebra in Combinatorics 


I, J of indices and positive numbers a;, 8; such that $; a; > >), 8j. But it is 
clearly impossible to rank all z; above all z;. LJ 


There is no known combinatorial proof of the following result. On the 
other hand, the algebraic proof is rather simple. 


25. Let m > n + 1 and let A1, Ao,..., Am be subsets of {1,2,...,n}. Then 
there are disjoint sets J, J such that J A; = U A; and () Aj = () Aj. 

ie! jEJ ie! jeJ 
Lindstrom’s theorem 


Proof. Associate to each set A; a vector v; € R?” say 
Vi = (£1, T2,..., Tn, 1 — T1, l — T2,..., 1 — Zp), 


where z; = 1 if j € A; and 0 otherwise. We obtain at least n + 2 vectors that 
lie in the (n + 1)-dimensional vector subspace of R?” defined by the equations 
£1 + Y1 = T2 + yo = +- = Tn + Yn. Thus, these vectors are linearly dependent 
and we can find a1,a2,...,am € R not all zero and such that >>", aiv; = 0. 
Define now J = {iļla; > 0} and J = {ila; < 0}. Clearly J,J are disjoint 
and we prove that U A; = LU A; and A A; = () A;. This is very easy: if 


ie! jEJ ie! jeJ 
j€ U A; but j ¢ LU Aj, then by definition the jth component of all v; with 
ie! jeJ 


i € J is 0, whereas there exists an i € J with the jth component of v; nonzero. 
But then, looking at the jth component in the equality }°;", a;v; = 0 yields 
a contradiction. By symmetry, we deduce that LJ Ai = U A;. Similarly, 


icl jEJ 
looking at the n+1,...,2nth coordinate in the equality $`;2; a;v; = 0 we can 
show that M Aj = M A;, which finishes the proof of the theorem. LJ 


ie! jeJ 
In order to motivate the following result, we need some preliminaries. 
Suppose that F is a collection of subsets of some set X. We say that F 


shatters a subset S of X if {A N S|A € F} contains all subsets of S. The 
Vapnik-Chervonenkis dimension of F (also denoted ve(F)) is the maximal 


12.5. The linear independence trick orl 


cardinality of a set shattered by F. For instance, let X = R” and let F be 
the collection of all affine half-spaces. Then vc(f’) = n+ 1, though this is 
not obvious. Note that F shatters the set of vertices of an n + 1-simplex, so 
vc(F) > n+ 1. The converse inequality follows from the following classical 
theorem of Radon: if S cC R” is a set with more than n + 1 elements, then one 
can find disjoint subsets A, B of S such that the convex hulls of A and B have 
common points. The following beautiful result shows that large collections 
shatter big sets. 


26. Let F be a family of subsets of {1,2,...,n} such that 
n n n 
F| > (o) + (") pet (.",) 


Then ve(F) > k. 
Sauer-Shelah’s lemma 


Proof. Let us assume that ve(F) < k — 1. We will prove that 


F| < (O) +) t+ (24): 


Let P,_1 be the family of all subsets A of {1,2,...,n} such that |A| < k—1. To 
any X € F associate the map fx : Pk-1 > R defined by fx(A) = lacx (this 
is equal to 1 if A C X and to 0 otherwise). We will prove that the maps fx are 
linearly independent when X runs over the elements of F. This will clearly 
imply the desired bound on the size of F. Assume that $ xepaxfx =0 for 
some real numbers ax, not all equal to zero. Evaluating at elements A € Pk-1, 
we deduce that for any such A we have ` xe Facx 4x = 0. The key step is to 
choose a minimal subset A such that ` yer 4cx 4x #0. We will prove that 
for any B C A we can find X € F such that B= AN X. This will contradict 
the property of F and the result will follow. It is of course enough to prove 
that \oxerB=xna 4x # 0. But by the inclusion-exclusion principle we have 


So ax= SO (71). So ax 


XEF,XNA=B BCCCA XEFP,CCX 


572 Chapter 12. Higher Algebra in Combinatorics 


However, by minimality of A all sums } ` yep ccx @x are zero, except for C = 


A. Thus the right-hand side is nothing more than (—1)!4-4| D XEFACX ax, 
which is nonzero by the choice of A. This finishes the proof. 


Proof. We will prove by induction on |F| the following statement: for any n 
and any family F of subsets of {1,2,...,n}, F shatters at least |F| subsets of 
{1,2,...,n}. This trivially implies the desired result. Now, the result clearly 
holds for |F| = 1, so assume that it holds for all |F| < N and take a family 
F with N elements, all of them subsets of {1,2,...,n}. Let Fo consist of the 
elements of F not containing 1 (we assume, without loss of generality, that 1 
occurs in some, but not all elements of F). By induction, Fo shatters at least 
|Fo| subsets of {2,3,...,n}. Also by induction, Fı = {X\{1}:X €F,le x} 
shatters at least |Fı| subsets of {2,3,...,n}. But if A C {2,3,...,n} is 
shattered by Fı then A is also shattered by {X € F|1 € X}, and if A is 
shattered by both Fp and Fı then both A and AU {1} are shattered by F. 
We conclude that at least |Fo|+|71| = |F| subsets of {1,2,...,n} are shattered 
by F, and the result follows. O 


Remark 12.16. Here is a beautiful geometrical interpretation of the previous 
result: let S be a subset of {—1,1}" with more than an (7) elements. Then 
there exists a subset J C {1,2,...,n} such that 


{(zi)ier|x € S} = {-1,1}""1. 


That is, a large subset of the vertices of the unit hypercube has a coordinate 
projection covering a unit hypercube in a smaller dimension! 


We present two proofs of the following classical theorem of Bollobas. One 
of the proofs needs some preliminaries on resultants. Let K be a field and let 
f =agp+a,X +--+ +a,X? and g = bo +b, X +---+6,X4% be two polynomials 
with coefficients in K. Let K[X], be the n + 1-dimensional K-vector space of 
polynomials of degree at most n in K[X]. It is easy to see that gced(f,g) = 1 
if and only if the map y : K[X]g-1 x K[X]p-1 > K[X]p49-1, Y(U, V) = 
Uf + Vg is injective. Since the source and the target are K-vector spaces 
of the same finite dimension, it follows that gcd(f,g) = 1 if and only if is 
an invertible linear map, i.e. if and only if the matrix of y in the natural 


12.5. The linear independence trick O13 


bases of K[X],-1 x K[X]p-1 and K[X]p+q-1 is invertible. This matrix has the 
first q columns the coordinates of f, X f,...,X%~'f with respect to the basis 
(1,X,...,X?+¢-!) and the last p columns the coordinates of g, Xg,..., X?~1g 
with respect to the same basis. The determinant of this matrix is called the 
resultant of f and g and is denoted Res(f,g). The previous discussion shows 
that Res(f,g) = 0 if and only if f and g have a common nontrivial divisor, 
which is the same as saying that they have a common root in some algebraic 
closure of K. 


27. Let a,b be positive integers and let Aj, Ao,..., Am and B1, Bo,...,Bm 
be sets such that 


a) Mil = |A] = ++ = [Am] = a and |By| = |Bo| = +++ = |Bm] = b. 
b) A:N Bj is nonempty if and only if 2 £ J. 


Then m < (77°). 


Bollobas’ theorem 


Proof. We will only assume that A; N B; # @ for i < j. We may assume that 
Ai, Bj are subsets of {1,2,...,n} for some n. Choose v1, v2,...,Un € Re! 
such that any a+1 vectors among them are linearly independent (this is easily 
achievable). Let Up E R°t! be a vector orthogonal to all v, with u € Az. Such 
a vector exists and is unique up to a scalar multiple as by construction the 
vectors (Vy)ueA, Span a hyperplane of R¢t?. 

Consider the polynomials 


GX) = |] wX), 


xEBj 


where (,) is the standard inner product and X = (Xj, X2,...,Xa+1). These 
are homogeneous polynomials of degree b in a + 1 variables, living thus in a 
space of dimension (e79). We claim that f;(v;-) = 0 for i < j and fi(v}) Æ 0, 
which implies that the f; are linearly independent and so m < (e79). To prove 
the claim, let i < j. Since A;N B; # @, there is x € BN A; and so (vz, v; L) = 0. 
Thus f;(v}) = 0 for i < j. Also, since A; N B; = 0, we have (vz, vF) £ 0 for 
all x € B;, thus fi(vt) 4 0. o 


574 Chapter 12. Higher Algebra in Combinatorics 


Proof. Of course, we may assume that all A;, Bj are subsets of R. Consider 
the polynomials 


fi(X) = I[&-a, gX) = [] (x -9). 


ac A; be Bj 


By assumption, fi and g; have a common root if and only if 1 # j. Thus, if 
R(f,g) is the resultant of two polynomials f,g, we have R(fi,g;) = 0 if and 
only if i Æ 7. 
Now, write 
g(X) = go +X +--+ gX” 


and observe that the expression 


Ai(go,---,96) = R(fi,g) 


is a homogeneous polynomial of degree a in the variables go, ..., gy. Moreover, 
the previous results show that A;(gj0,.--, 930) = 0 if and only if i Æ j. Thus, 
the polynomials A; are linearly independent and homogeneous of degree a in 
b+ 1 variables. Since the vector space of such polynomials has dimension 
(0°) the result follows. 0 


a 


The following problem is very challenging. We present the author’s proof. 


28. Let 21,22,...,Zn be distinct real numbers and suppose that the vector 
space spanned by x; — x; over the rationals has dimension m. Then the 
vector space spanned only by those z; — x; for which z; — £j # Tk — T1 
whenever (i,j) Æ (k,l) also has dimension m. 


Straus’s theorem 


Proof. By working with the numbers z; — x1, we may assume that zı = 0. 
Let V be the span of all differences z; — zj, so dim V = m. Say a difference 
Zi — zj is unique if z; — zj = £k — T, implies that i = k and j = l. Let V’ be 
the span of the unique differences, let m’ be its dimension and suppose that 
m’ <m. As V’ is a subspace of V, there is a basis v1, v9,...,Um of V such 


12.5. The linear independence trick 575 


that v1,V2,...,Um is a basis of V’. As xı = 0, we have z; € V for all i, so we 
can write 
m 
Ti = `o wy 
l=1 


with z” € Q. Note that there exists r such that r™ Æ 0, as otherwise V 
would be included in the span of the vectors v1,v2,...,Um_—1, contradicting 
that dim V = m. l 
Let now z;i(t) = £i + tr™. The crucial fact is the following 
Lemma 12.17. There exists t € R such that 
1) If xi(t) = 2;(t), then i = j. 


2) max 2;(t) — min xilt) > max z; — min zi. 


Proof. For all i # j, the equation x:(t) = x;(t) has at most one solution 
(recall that x; # xj), so 1) fails for at most finitely many t. On the other 


hand, z;(t) = 0, so if r is chosen such that a”) Æ 0, then the left-hand side 


of 2) is at least x, + ta\™) and this becomes arbitrarily large for t >> 0 or 
t << 0. The conclusion follows. LJ 


Coming back to the proof, fix such a t and choose j,k such that z;(t) = 
max z;(t) and z(t) = min z;(t). We claim that the difference zj — £k is 
unique and x) Æ i") This will yield a contradiction, as x; — £k E€ V” in 
this case, so its coordinate with respect to um has to be zero. To prove the 
claim, assume that 2; — £k = Zu — Ty. Looking at the mth coordinate, we 
obtain 26 — 2! = ol” — 2” so that z (t) — 2, (t) = y(t) — z(t). But 

J k J 
then clearly z;(t) = a(t) and z(t) = T(t), so that by the choice of t we 


must have j = u and k = v. Finally, if x) = oh), then 


xj(t) — x,(t) = £j — Ik < Max ti — min ti, 


a contradiction. go 


076 Chapter 12. Higher Algebra in Combinatorics 


12.6 Applications to geometry 


In this section we discuss some rather hard problems with geometric flavor. 
Before passing to the first problem, we need to recall a few basic facts from 
convex geometry. Suppose that K is a closed convex subset of R”. The dual 
of K is 


K* = {x € R"|(z,v) > 0,Vu € K}, 


where (-) is the standard inner product on R”. It is easy to see that K™ is 
again a closed convex set. Standard but nontrivial separation theorems in 
convex geometry imply the fundamental equality A** = K for any closed and 
convex subset K of R”. Classical examples of closed convex sets are the convex 
hull of finitely many points and the cone generated by finitely many vectors.° 
Applying the equality K** = K to the cone generated by v1, v2,...,Ux, we 
obtain the following beautiful result, which is one of the many versions of 
Farkas’ lemma: 


Theorem 12.18. (Farkas’ lemma) Let v, v1, v2,...,Uk € R” and suppose that 
(w,v) > 0 whenever min(w,v;) > 0. Then v is in the cone generated by 
U1, U2,.+.+, Uk- 


The following hard problem is a very nice way to re-state Farkas’ lemma. 


29. A figure composed of 1 by 1 squares has the property that if the squares 
of a fixed m by n rectangle are filled with numbers the sum of all of 
which is positive, the figure can be placed on the rectangle* so that the 
numbers it covers also have positive sum. Prove that a number of such 
figures can be placed on the rectangle such that each square is covered 
by the same number of figures. 


Russia 1998 


3If uj, v2,..., Uk E R”, the cone generated by these vectors is the set of linear combinations 
avı + a2Vv2 +---+axux, where a; are nonnegative real numbers. It is clear that this cone 
is a convex set, but it is not really obvious that it is a closed subset of R”. 

‘Possibly after being rotated by a multiple of 5- 

5However, the figure may not have any of its squares outside the rectangle. 


12.6. Applications to geometry 577 


Proof. The first step (and the trickiest...) is to restate the problem in a more 
algebraic way. Suppose that we have N ways to put the figure in the rectangle, 
possibly after being rotated and call these ways 1,2,..., N. To each such way 
1 associate its incidence vector v; € R™”, where the jth coordinate of v; is 1 if 
the square 7 is covered and 0 otherwise (we number the squares of the table 
in some way from 1 to mn). Now, the hypothesis becomes: for any v € R™” 
such that (v,1) > 0, we also have max(v,v;) > 0. Here 1 € R™” is the vector 
all of whose coordinates are 1. The result then follows from the following 


Lemma 12.19. Let v,v,,v2,...,un E Q” be vectors, with v nonzero. Suppose 
that for allw € R", if (w,v) > 0, then max(w,v;) > 0. Then there are rational 
nonnegative numbers aj,a2,...,an such that v = aiv + aqua + +-+ aNnunN. 


If we accept this for a moment, we deduce the existence of nonnegative 
integers a1,@9,...,@y and of a positive integer a such that a-1 = avı +aqva+ 
--- + anun. SO, if we put a; copies of the ith placement of the figure, each 
square is covered a times and the problem is solved. 

Now, let us turn to the proof of the lemma. By Farkas’ lemma, we can find 
nonnegative real numbers x71, £2,..., £N Such that v = 710; +2Qvo+---+2NUN. 
Forgetting about those indices 7 such that xz; = 0, we find a linear system in 
the z;’s (obtained by writing the previous equality coordinate by coordinate), 
whose associated matrix has rational entries and which has positive solutions. 
We claim that such a system also has positive rational solutions. This follows 
from the following standard: 


Lemma 12.20. If a linear system of equations with rational coefficients has 
real solutions, then the rational solutions are dense in the set of real solutions. 


Proof. As the associated matrix has rational entries, if we put it in reduced 
row-echelon form (i.e. perform Gaussian elimination), we immediately deduce 
that the space of solutions has a basis with rational coordinates. The result 
follows from the density of Q in R. LJ 


We’re done. LJ 


To motivate the following problems, we need to recall a famous result of 
Dehn, also known as Hilbert’s third problem: given two polyhedra in R3 with 


578 Chapter 12. Higher Algebra in Combinatorics 


equal volume, can we always cut the first one into finitely many polyhedrons 
which can be reassembled to yield the second polyhedron? Dehn showed that 
the answer is negative in a rather complicated way, but nowadays there is a 
very beautiful and short proof due to Hadwiger, which we will sketch now. 
We claim that the regular simplex and the cube yield a negative answer to 
Hilbert’s problem. Let a = arccos 7 the measure of the dihedral angles of the 
regular simplex. It is easy to see that = is irrational (using the fact that if r 
and cosrz € Q, then cosrr € {+3, +1,0}, which was explained in chapter 
9, section 9.2). Let a,,...,a, be the dihedral angles appearing at the edges 
of the pieces of the dissection and let V be the Q-vector space generated by 
them. As © is irrational, there is a linear form f : V > Q such that f(r) =0 
and f(a) = 1. If P is a polyhedron appearing in the dissection, define its 
Dehn invariant by H(P) = $`, |e| f(a), the summation being taken over the 
edges of P, |e| being the length of e and a the dihedral angle at e. Simple 
geometry shows that this invariant is additive. Since the invariant of the cube 
is 0 and that of the regular simplex is not, it follows that we cannot cut the 
regular simplex into pieces that can be reassembled to form the unit cube. 
The following problem uses a similar argument, which is extremely useful in 
tiling problems. We need one more preliminary result: any vector space has a 
basis and any linearly independent set of vectors extends to a basis. 


30. An a x b rectangle is divided into squares with sides parallel to that of 
, , a b 
the rectangle and with side lengths z1, £2,...,£n. Prove that — and — 
Ti Ti 
are rational numbers. 


Dehn’s theorem 


Proof. The first ingredient is the following 


Lemma 12.21. There exists an additive map f : R — R such that f(x) = 0 
if and only if x is a rational multiple of a. 


Proof. Consider a basis B of R as a Q-vector space and such that a € B. 
Choose a bijection g : B — {a} > B (g exists, because B is an infinite set) 
and define f(a) = 0 and f(z) = g(x) for all x € B — {a}. To define f for any 


12.6. Applications to geometry 579 


real number r, express r as a finite combination r = z1b1 + rqbo +--+ + Indy 
of elements b; € B with rational coefficients z; and define 


f(x) = zı f (b1) + z2 f (b2) +--+ + £n f (bn). 


We obtain in this way an additive map f and we have f(r) = 0 if and only if r 
is a rational multiple of a. Indeed, if f(r) = 0 and r = z1b1 + 29b9 +- - -+ Enbn 
is as above, then 


Tı f(b1) + z2 f (be) +--+ +2nf (bn) = 0. 


Now the elements f(b;) are either equal to 0 (if b; = x) or equal to g(b;). 
Moreover, the g(b;) are distinct (because g is injective and the b;’s are distinct) 
and are linearly independent (since they are elements of a basis B). The only 
possibility to have the previous relation is x; = 0 whenever b; Æ x, which 
means precisely that r is a rational multiple of a. LJ 


Define the invariant of a rectangle R with side-lengths x,y by 


H(R) = f(x) f(y). 
The following lemma establishes the additivity of this invariant. 


Lemma 12.22. If a rectangle R is partitioned into finitely many rectangles 
R,, Ro,..., Rg, then H(R) = H(Rı) + H(Rə) + + H(R,). 


Proof. This is essentially obvious by additivity of f: any finite partition with 
rectangles can be refined to a partition which forms a grid (just extend all 
sides of all rectangles in the partition until they meet the sides of R), so it 
is enough to check the claim for a grid. This reduces then to proving that 
if R is partitioned into two rectangles R,, R2 by a line parallel to one of its 
sides, then H(R) = H(R,)+ H(R2). But this is clear, by definition of H and 
because f is additive. LJ 


Coming back to the proof and using the lemma, we obtain 


0 = f(a) f(b) = >> F(ai)?, 
1=1 


080 Chapter 12. Higher Algebra in Combinatorics 


which yields f(x;) = 0 for all ¿ and so are rational numbers. Since we could 


b 
have worked with b instead of a from the very beginning, it follows that — 
i 
are also rational numbers and the theorem is proved. LJ 


Remark 12.23. In [51], the following similar result is proved using Dehn’s 
original method: a rectangle R has at least one rational sidelength and is tiled 
by smaller rectangles, each having a rational perimeter. Then all sidelengths 
of R and of the tiling rectangles are rational. 


The following stronger result will be used in the next problems. The proof 
is very similar to that of problem 30. 


Theorem 12.24. (Dehn) Let the rectangle Ro be tiled with rectangles 
R,, Ro,..., Rn. Let r; be the side-ratio of Ri. Then ro E Q(r1,T2,...,Tn). 


Proof. Suppose that ro ¢ L = Q(r1,T2,...,Tn), so we may pick a basis B of R 
as L-vector space, such that the base and height bo, ho of Ro are in B. Define 
the invariant of a rectangle bxh to be c1(b)c2(h) —c1(h)co(b), where cı (x£), c2(x) 
are the coordinates of x with respect to bo, ho E€ B. As c; are L-linear maps, it 
is easy to check that R; has invariant 0. An argument similar to the one used 
in lemma 12.22 shows that the invariant is additive, so the invariant of Rọ is 
the sum of the invariants of the R;’s, that is 0. This contradicts the fact that 
Ro has invariant 1. E 


We end this chapter with a truly amazing result, whose proof is taken 
from the beautiful paper [34]. The proof requires the following rather technical 
theorem, which we will take for granted, since the proof would take us quite 
far afield. 


Theorem 12.25. (Wall) If r is an algebraic number all of whose conjugates 
have positive real part, then there exist n > 1 and positive rational numbers 
C1, C2,...,Cn such that 


l 


1 . 
Cor — 
2 + cart otoy 


l= cr+ 





12.6. Applications to geometry 58l 


31. Let r be a positive real number. Prove the equivalence of the following 
statements: 


1) the unit square can be tiled with finitely many rectangles similar 
to the 1 x r rectangle and having sides parallel to the sides of the 
square. 


2) r is algebraic and all of its conjugates have positive real part. 


Laczkovich-Szekeres theorem 


Proof. Prepare for a long but exciting battle. Recall that the side-ratio of a 
rectangle is the quotient between its horizontal side and its vertical side. 
Step 0. Let us suppose that r is algebraic and all of its conjugates 
have positive real part. Choose c1,c2,... as in Wall’s theorem and cut off a 
rectangle of side-ratio cır from a unit square, by a vertical cut. From the 
1 


remaining rectangle, cut off a rectangle of ratio ar by a horizontal cut. The 
1 


Car+ 
the unit square by rectangles with side-ratios cır, =, Gf, .... We conclude 
using the fact that the c;’s are rational. This proves one implication. The 
other implication is harder and requires the following steps. 

Step 1. First, we prove. that if there is a partition of a square with 
rectangles whose side-ratio is r or 1/r, then r is algebraic. Let x be any 
real number which is transcendental over Q(r) and scale the vertical axis 
by a factor of x. Applying theorem 12.24 to this new situation, we deduce 
that x E€ Q(rz,z/r), so we have an equality cP(rz,z/r) = Q(rz,z/r) for 
P,Q € Q[|X,Y]. As z is transcendental over Q(r), the previous equality yields 
XP(rX,X/r) = Q(rX,X/r) in Q(r)[X]. Looking at the leading coefficient, 
we easily obtain a nontrivial polynomial with rational coefficients killing r, 
thus r is algebraic. 

Step 2. We claim that if r > 0 is algebraic and has a conjugate with 
real part smaller than or equal to 0, then it has a conjugate with negative real 
part. Indeed, otherwise r has a purely imaginary conjugate and so there is a 
root x of the minimal polynomial p of r such that p(x) = p(—z) = 0. As p 
is irreducible, p must divide p(X) + p(—X) and so p(r) + p(—r) = 0. Thus 
—r < Q is a root of p and we are done. 


remaining rectangle has side-ratio c3r + . Repeating the process, we tile 


582 Chapter 12. Higher Algebra in Combinatorics 


Step 3. Assume now that r Has a conjugate with negative real part. Let 
p be the minimal polynomial of r and let Q be the companion matrix® of p. 
Let B be a basis of R as Q(r)-vector space, chosen such that 1 € B. If x € R, 
its coordinate with respect to 1 € B is of the form an +a,+-:-+an_1r" ! with 
a; E Q and we let vz be the column vector with coordinates ag, a1,...,@n—1- 
Thus vrz = Quz. The key point is the following: 


Lemma 12.26. There exists a symmetric matriz M € M,(R) such that 
1) ŻMQv > 0 for all v € R’. 
2) There exists s € Q(r) such that út Mvs < 0. 


The proof of this lemma will be given in the next step. Let us see why 
this finishes the proof of the theorem. Define the invariant of a rectangle b x h 
to be vk M vp. In the usual way, we obtain that this invariant is additive. The 
invariant of a rectangle b x br is vt Mvp, = vt M Qu > 0 and the invariant of a 
rectangle br x b is equal to that of a rectangle b x br, as M is symmetric. So 
rectangles of side-ratio r have nonnegative invariant. On the other hand, 2) 
ensures the existence of an s x s square with negative invariant. This square 
cannot be tiled by rectangles with side-ratio r and we are done. 

Step 4. We prove the lemma. As p has no multiple root (because it is 
irreducible), Q is diagonalizable over C. As it has real entries, it is immediate 
to deduce the existence of a block-diagonal matrix D and of P € GL,(R) such 
that Q = PDP7!. Moreover, each block of D has side 1 (and corresponds to a 
real eigenvalue of Q) or is of the form Á (and corresponds to complex 


Q 
—B 
conjugate eigenvalues a+7f of Q). Let T be the diagonal matrix whose entries 
are the real parts of the eigenvalues of Q. We choose M = (P~!)'TP7!. Then 


for all v we have 


vtMQv = (P7'v)'TD(P7'v) = Dwi a; > 0, 


°Recall that this means that Q is the matrix of the multiplication by r, seen as an 
endomorphism of the Q-vector space Q(r). 


12.7. Notes 083 


where w; are the coordinates of Plv and a; are the squares of the real parts 
of the eigenvalues of Q. So the first property is satisfied. On the other hand, T 
has a negative entry (as r has a conjugate of negative real part by step 2 and 
the conjugates of r are exactly the eigenvalues of Q), which we may assume 
to be the first one. If e; is the column vector whose first coordinate is 1 and 
the other are 0, we deduce that (Pe,)'MPe, < 0. Simply choose v € Q” 
sufficiently close to Pe, to ensure that v'Mv < 0 and then choose s € Q(r) 
such that v, = v. The lemma is proved. O 


12.7 Notes 


The solutions to some of the problems in this chapter are due to the 
following people: Alon Amit (problem 11), Tom Belulovich (problem 4), Aart 
Blokhuis (problem 27), Iurie Boreico (problem 15), Zeb Brady (problem 21), 
Alexandru Chirvăsitu (problem 13), Darij Grinberg (problems 12, 15, 19), 
Omid Hatami (problem 12), Xiangyi Huang (problems 15, 20), Laszlo Lovasz 
(problem 27), James Merryfield (problem 2), Jorge Miranda (problem 14), 
Fedja Nazarov (problems 7, 24, 29), Hunter Spink (problem 8), J. Steinhardt 
(problem 18), Gjergji Zaimi (problems 5, 7, 9, 10, 11, 20). 


Bibliography 


[1] 
[2] 
[3] 
[4] 
[5] 
[6] 
[7] 


[8] 
[9] 


[10] 
[11] 
[12] 


[13] 


M. Ajtai, V. Chvátal, M. Newborn, and E. Szemerédi. Crossing-free subgraphs. North- 
Holland Mathematics Studies, 60:9-12, 1982. 


N. Alon and J. H. Spencer. The probabilistic method, volume 73 of Wiley-Interscience 
in Discrete Mathematics and Optimization. John Wiley & Sons, Inc., 3rd edition, 2008. 


T. Andreescu and G. Dospinescu. Problems from the Book. XYZ Press, 2nd edition, 
2010. 


N. C. Ankeny and C. A. Rogers. A conjecture of Chowla. The Annals of Mathematics, 
Second Series, 53(3):541-550, May 1951. 


A. S. Besicovitch. On the linear independence of fractional powers of integers. Journal 
of the London Mathematical Society, 15(1):3-6, 1940. 


M. Bhargava. The factorial function and generalizations. The American Mathematical 
Monthly, 107(9):783-799, November 2000. 


B. Bollobas. Combinatorics: set systems, hypergraphs, families of vectors, and combi- 
natorial probability. Cambridge University Press, 1986. 


E. Bombieri. On the large sieve. Mathematika, 12(2):201-225, 1965. 


E. Bombieri. Le grand crible dans la théorie analytique des nombres. Number 18 in 
Astérisque. Société mathématique de France, 2nd edition, 1974. 


I. Boreico. My favorite problem: linear independence of radicals. The Harvard College 
Mathematics Review, 2(1), 2008. 


J. Bourgain, N. Katz, and T. Tao. A sum-product estimate in finite fields, and appli- 
cations. Geometric and Functional Analysis, 14(1):27-57, 2004. 


T. C. Brown and J. P. Buhler. A density version of a geometric Ramsey theorem. 
Journal of Combinatorial Theory, Series A, 32(1):20-34, 1982. 


J. W. S. Cassels. Local fields, volume 3 of London Mathematical Society Student Texts. 
Cambridge University Press, 1986. 


080 


586 Bibliography 


[14] K. Chandrasekharan. The work of Enrico Bombieri. In R. D. James, editor, Proceed- 
ings of the International Congress of Mathematicians, volume 1, pages 3-10. Canadian 
Mathematical Congress, 1974. 


[15] F. R. K. Chung. Sphere-and-point incidence relations in high dimensions with ap- 
plications to unit distances and furthest-neighbor pairs. Discrete €& Computational 
Geometry, 4(1):183-190, 1989. 


[16] F. R. K. Chung, E. Szemerédi, and W. T. Trotter. The number of different distances de- 
termined by a set of points in the Euclidean plane. Discrete & Computational Geometry, 
7(1):1-11, 1992. 


[17] F. Clarke and C. Jones. A congruence for factorials. Bulletin of the London Mathemat- 
ical Society, 36(4):553-558, 2004. 


[18] J. H. E. Cohn. On square Fibonacci numbers. Journal of the London Mathematical 
Society, s1-39(1):537-540, 1964. 


[19] A. Cojocaru and M. R. Murty. An introduction to sieve methods and their applications, 
volume 66 of London Mathematical Society Student Texts. Cambridge University Press, 
2005. 


[20] J. H. Conway and A. J. Jones. Trigonometric diophantine equations (on vanishing sums 
of roots of unity). Acta Arithmetica, 30(3):229-240, 1976. 


[21] L. E. Dickson. Introduction to the theory of numbers. Dover Publications, Inc. New 
York, 1957. 


[22] A. Dumitrescu, M. Sharir, and C. D. Tóth. Extremal problems on triangle areas in two 
and three dimensions. Journal of Combinatorial Theory, Series A, 116(7):1177-1198, 
2009. 


[23] G. Elekes. On the number of sums and products. Acta Arithmetica, 81(4):365-367, 
1997. 


(24) P. D. T. A. Elliott. The Turan-Kubilius inequality. Proceedings of the American Math- 
ematical Society, 65(1):8-10, July 1977. 


[25] P. D. T. A. Elliott. Probabilistic number theory, volume 1, 2. Springer-Verlag, 1979, 
1980. 


(26] P. Erdos. A theorem of Sylvester and Schur. Journal of the London Mathematical 
Society, s1-9(4):282-288, 1934. 

[27] P. Erdős. On the greatest prime factor of [[7_, f(k). Journal of the London Mathe- 
matical Society, s1-27(3):379-384, 1952. 


[28] P. Erdős and M. Kac. The Gaussian law of errors in the theory of additive number 
theoretic functions. American Journal of Mathematics, 62(1):738-742, 1940. 


[29] P. Erdős, A. Rényi, and V. T. Sós. On a problem of graph theory. Studia Sci. Math. 
Hungar, 1:215-235, 1966. 


Bibliography 587 


[30] 
[31] 
[32] 


[33] 


[34] 
[35] 
[36] 


[37] 
[38] 
[39] 


[40] 
[41] 
[42] 
[43] 
[44] 


[45] 


J. H. Evertse. The number of solutions of linear equations in roots of unity. Acta 
Arithmetica, 89(1):45-51, 1999. 


B. Farhi. An identity involving the least common multiple of binomial coefficients and 
its application. The American Mathematical Monthly, 116(9):836-839, 2009. 


H. Flanders. Generalization of a theorem of Ankeny and Rogers. The Annals of Math- 
ematics, Second Series, 57(2):392-400, March 1953. 


P. Frankl, R. L. Graham, and V. Rodl. On subsets of abelian groups with no three 
term arithmetic progression. Journal of Combinatorial Theory, Series A, 45(1):157-161, 
1987. 


C. Freiling and D. Rinne. Tiling a square with similar rectangles. Math. Research 
Letters, 1:547-558, 1994. 


D. Grinberg. An inequality involving 2n numbers. http://www.cip.ifi.lmu.de/ 
~grinberg/Yugoslaviai998. pdf. 


L. Guth and N. H. Katz. On the Erdos distinct distance problem in the plane. arXiv: 
1011.4105. 


H. Halberstam and H. E. Richert. Sieve Methods. Academic Press New York, 1974. 
D. Hanson. On the product of the primes. Canad. Math. Bull, 15:33-37, 1972. 


G. H. Hardy and S. Ramanujan. The normal number of prime factors of a number n. 
Quart. Journal Math, 48:76-92, 1917. 


A. Hildebrand. On Wirsing’s mean value theorem for multiplicative functions. Bulletin 
of the London Mathematical Society, 18(2):147-152, 1986. 


C. Hooley. On the greatest prime factor of a quadratic polynomial. Acta Mathematica, 
117(1):281-299, 1967. 


C. Huneke. The friendship theorem. The American Mathematical Monthly, 109(2):192- 
194, February 2002. 


K. F. Ireland and M. I. Rosen. A Classical Introduction to Modern Number Theory, 
volume 84 of Graduate Texts in Mathematics. Springer, 1990. 


H. Iwaniec and E. Kowalski. Analytic Number Theory, volume 53 of Colloquium Publi- 
cations. American Mathematical Society, 2004. 


C. Lech. A note on recurring series. Arkiv for Matematik, 2(5):417-421, 1953. 


[46] Y. V. Linnik. The large sieve. In Dokl. Akad. Nauk SSSR, volume 30, pages 292-294, 


1941. in Russian. 


[47] Y. V. Linnik. A remark on the least quadratic non-residue. In C. R. (Doklady) Acad. 


[48] 


Sci. URSS (N.S.), volume 36, pages 119-120, 1942. 


K. Mahler. On the fractional parts of the powers of a rational number. Acta Arithmetica, 
3:89-93, 1938. 


588 Bibliography 


[49] K. Mahler. p-adic numbers and their functions., volume 76 of Cambridge Tracts in 
Mathematics. Cambridge University Press, 2nd edition, 1981. 


[50] H. B. Mann. On linear relations between roots of unity. Mathematika, 12(2):107-117, 
1965. 


[51] D. G. Mead and S. K. Stein. More on rectangles tiled by rectangles. The American 
Mathematical Monthly, 100(7):641-643, August-September 1993. 


[52] R. Meshulam. On subsets of finite abelian groups with no 3-term arithmetic progres- 
sions. J. Comb. Theory, Series A, 71(1):168-172, 1995. 


[53] P. Monsky. On dividing a square into triangles. The American Mathematical Monthly, 
77(2):161-164, February 1970. 


[54] P. Monsky. Simplifying the proof of Dirichlet’s theorem. The American Mathematical 
Monthly, 100(9):861-862, 1993. 


[55] H. L. Montgomery. Topics in Multiplicative Number Theory, volume 227 of Lecture 
Notes in Mathematics. Springer-Verlag, 1971. 


[56] H. L. Montgomery. The analytic principle of the large sieve. Bull. Amer. Math. Soc., 
84:547-567, 1978. 


[57] H. L. Montgomery and R. C. Vaughan. The large sieve. Mathematika, 20(2):119-134, 
1973. 


[58] L. J. Mordell. On the linear independence of algebraic numbers. Pacific Journal of 
Mathematics, 3(3):625-630, 1953. 


[59] M. R. Murty. Small solutions of polynomial congruences. Indian Journal of Pure and 
Applied Mathematics, 41(1):15-23, 2010. 


[60] T. Nagell. Généralisation d'un théorème de Tchebycheff. J. Math. Pures Appl., Série 
8, 4:343-356, 1921. 


[61] M. B. Nathanson. An exponential congruence of Mahler. The American Mathematical 
Monthly, 79(1):55-57, January 1972. 


[62] D. J. Newman. Analytic number theory, volume 177 of Graduate Texts in Mathematics. 
Springer Verlag, 1998. 


[63] P. J. O'Hara. Another proof of Bernstein’s theorem. The American Mathematical 
Monthly, 80(6):673-674, June-July 1973. 


[64] H. Pan and Z. W. Sun. A combinatorial identity with application to Catalan numbers. 
Discrete Mathematics, 306(16):1921-1940, 2006. 


[65] A. Postnikov. Intransitive trees. Journal of Combinatorial Theory, Series A, 79(2):360- 
366, August 1997. 


[66] V. V. Prasolov. Polynomials, volume 11 of Algorithms and Computation in Mathemat- 
ics. Springer Verlag, 2009. 


Bibliography 989 


[67] 


[68] 


[69] 
[70] 


[71] 


[72] 
[73] 


[74] 


[75] 


[76] 
(77 
(78 
(79) 
[80] 


[81] 


A. Robert. A course in p-adic analysis, volume 198 of Graduate Texts in Mathematics. 
Springer-Verlag New York, Inc., 2000. 


A. Robert and M. Zuber. The Kazandzidis supercongruences. a simple proof and an 
application. Rendiconti del Seminario Matematico della Università di Padova, 94:235- 
243, 1995. 


B. E. Sagan. Proper partitions of a polygon and k-Catalan numbers. Ars Combinatoria, 
88:109-124, 2008. 


J. L. Selfridge and E. G. Straus. On the determination of numbers by their sums of a 
fixed order. Pacific Journal of Mathematics, 8(4):847-856, 1958. 


I. E. Shparlinski. Exponential sums in coding theory, cryptology and algorithms. In 
H. Niederreiter, editor, Coding Theory and Cryptology, pages 323-383. World Scientific 
Publishing, Singapore, 2002. 


J. Solymosi. On sum-sets and product-sets of complex numbers. Journal de Théorie 
des Nombres de Bordeauz, 17(3):921-924, 2005. 


J. Solymosi. On the number of sums and products. Bulletin of the London Mathematical 
Society, 37(4):491-494, 2005. 


J. Solymosi and V. Vu. Distinct distances in high dimensional homogeneous sets. In 
Towards a Theory of Geometric Graphs, volume 342 of Contemporary Mathematics, 
pages 259-268. American Mathematical Society, 2004. 


J. Spencer, E. Szemerédi, and W. T. Trotter. Unit distances in the Euclidean plane. In 
B. Bollobás, editor, Graph theory and Combinatorics, pages 293-303. Academic Press, 
New York, 1984. 


R. P. Stanley. Enumerative combinatorics, vol. II, volume 62 of Cambridge Studies in 
Advanced Mathematics. Cambridge University Press, 1999. 


Z. W. Sun. On sums of binomial coefficients modulo p*. http://math.nju.edu.cn/ 
~zwsun/. 


Z. W. Sun and R. Tauraso. New congruences for central binomial coefficients. Advances 
in Applied Mathematics, 45(1):125-148, 2010. 


Z. W. Sun and R. Tauraso. On some new congruences for binomial coefficients. Int. J. 
Number Theory, 7(3):645-662, 2011. 


L. A. Székely. Crossing numbers and hard Erdos problems in discrete geometry. Com- 
binatorics, Probability and Computing, 6(3):353-358, 1997. 


E. Szemerédi and W. T. Trotter. Extremal problems in discrete geometry. Combina- 
torica, 3(3):381-392, 1983. 


[82] T. Tao and V. Vu. Additive combinatorics, volume 105 of Cambridge Studies in Ad- 


vanced Mathematics. Cambridge University Press, 2006. 


590 Bibliography 


83| R. Tauraso. An elementary proof of a Rodriguez-Villegas supercongruence. arXiv: 
& 


[84] A. Weil. Numbers of solutions of equations in finite fields. Bulletin of the American 
Mathematical Society, 55(5):497-508, 1949. 


[85] E. Wirsing. Das asymptotische Verhalten von Summen über multiplikative Funktionen. 
Mathematische Annalen, 143(1):75-102, 1961. 


[86] E. Wirsing. Das asymptotische Verhalten von Summen über multiplikative Funktionen 
II. Acta Mathematica Hungarica, 18(3):411-467, 1967. 


Printed by “Combinatul Poligrafic” 


